YouTube Transcript Generator | Extract & Download Video Transcripts

IPFS Alpha | Why We Must Distribute The Web

People & Blogs

25m 6s

41m 39s

Channel ID:

Video ID:

Free Credits

Upgrade

0 of 25 usedResets in 27 days

00:05

Uh the internet today has been described

00:07

as this big nervous system for humanity,

00:09

right? Uh so much of what we do today is

00:11

carried through these digital pipes. Um

00:14

you so much of our economy is just run

00:17

entirely uh on the internet. Now uh and

00:21

we interface with the internet and with

00:24

each other uh through a series of

00:26

applications uh and these run mostly on

00:28

the web.

00:29

Uh the web is uh sort of like the

00:33

transport protocol for how these

00:35

applications move around. It's a way of

00:36

being able to deploy

00:38

um different uh media uh and to be able

00:42

to move to to link people from one set

00:45

of uh things to another uh to put put

00:47

people in applications uh and to give

00:48

people uh functionality that they didn't

00:51

previously have. So it's like the page

00:53

just uh loads and you retrieve some

00:54

function new functionality you didn't

00:56

have and you now can access um uh new

00:59

things right so this was a huge

01:01

breakthrough of course uh but it has

01:03

some problems today uh now the crazy

01:06

thing about the internet the web is that

01:07

it's just a collection of protocols

01:09

right so it's a bunch of really good

01:11

ideas of how to do things that have

01:13

worked things that haven't uh that get

01:14

standardized put into uh put into

01:16

browsers and put into all their tools

01:17

into computers and then this orchestra

01:20

of programs runs uh and gives us this

01:24

amazing range of capabilities, right? So

01:26

everything that we as humans can do

01:27

today, you know, be able to learn

01:29

remotely, be able to uh think together,

01:32

be able to talk to uh a person around

01:35

the world, um happens thanks to these

01:38

great protocols that have been written

01:40

by by people that really were just

01:42

aiming to augment their own experiences

01:44

and through that ended up uh augmenting

01:48

uh what the rest of humanity can do. So

01:51

this is kind of mind-blowing, right?

01:53

Because it means that if you find a

01:54

problem with the internet or uh you have

01:56

some new ideas about what humanity

01:58

should uh be able to do, you can just

02:01

write a protocol uh and then you

02:04

implement it and if you're right uh and

02:06

it works then you tell the world and

02:08

then it gets deployed and then a lot of

02:10

people will use it and the world will be

02:11

a better place. So it's it's a great

02:13

place to be. Now uh we have found some

02:17

problems uh that we're not telling you

02:18

about and they mostly have to do with

02:20

how we move uh applications uh through

02:23

the web. So

02:24

it's moving not just applications but

02:27

any kind of media. So you know documents

02:29

images uh pictures video and so on. So

02:34

IPFS is mostly related to HTTP and it

02:36

sort of enhanced HTP and should be used

02:38

alongside it um or maybe as a shim uh

02:41

and potentially in some cases like

02:42

actually just transition over to to

02:45

using IPFS instead. Now uh the big

2:49

Location Addressing

02:50

problem behind this change or what we're trying to to solve here is the problem of using location addressing. So if you've looked at a URL uh you you have this first part of the of the URL that's the domain right uh and that you know here it's exampample.com

03:09

the domain is resolved to an IP address and an IP address means the the set of numbers that you need to dial uh to get connected to another computer uh across the network. So what does that picture look like? Say that you're here highlighted in blue and you're trying to

03:28

access this picture uh through the internet and you have an address and this address has uh you know these numbers and then a path for the for the actual file. That means that that you're going to find a specific other computer at that address and fetch the image from

03:45

that computer. Now suppose that all of the other computers pictured here have the exact same file locally, including one that's very close to you in the network perhaps. Maybe it's in the same room, but it doesn't matter. Uh, as far as HTP is concerned, that is not the

03:58

same file and you have to go across the world to 10.20.30.40 uh to find it. Actually, that's uh probably won't resolve because 10 is a local address. But anyway, um the issue here is that we are addressing uh content by location. So, we're telling you where to find something

04:21

instead of what it is. And a lot of people have talked about this as a problem and come up with solutions uh but it hasn't quite uh quite sunk in. And so to to drive uh more the point here and and to show you why this is actually a huge problem today uh picture

04:38

a big room of people and it's filled with people with a bunch of computers and you know nowadays we carry a laptop and maybe a mobile phone and soon watch and you know we'll have a bunch of devices you know say that I that I um upload a picture to to Facebook and I

04:54

give everyone the link and now a whole bunch of people are going to go to Facebook and fetch that image from Facebook all the way back. So, you know, not that big of a deal, right? Well, let's look at how it looks in the in the backbone. So, I make a request to

05:10

Facebook and I send it up and you know, say it's just a 1 megabyte image times 8 here because there's a set of say that there's eight links. So, it's a total amount of bandwidth, you know, this is these are really rough calculations just to give you a picture. Say that there's

05:23

8 megaby of bandwidth uh used by my picture going all the way up to Facebook. And when 30 people show up, you know, say they're all in the same room and they all talk to Facebook and all pull down the image. Now, that's 240 megabytes of bandwidth wasted. And, you

05:40

know, you might say, well, you know, that's unavoidable because we have to encrypt everything. And um and you know, that's true. We have encrypted everything. But, you know, it's, you know, we we have to ship the image 30 times across the wire. It doesn't matter

05:53

if it's wasted. Like, you just have to send it. Maybe that's true. Maybe that isn't true. Uh, and you know what? Well, maybe 240 megabytes, it's not a big deal, right? But what if we're looking at video instead? So, imagine that you you go to YouTube and you start watching

06:06

a video and it's, you know, in high def. So, it says it's like about 200 megabytes and, you know, you send it to everybody else and then like these 30 people start downloading a 200 megabyte video across these eight links. Now, we're talking about 48 GB of bandwidth.

06:22

That's starting to be a lot. uh when you look at bigger files or like you know longer videos and so on, we're looking at potentially you know terabytes of bandwidth needed just to move these files around from the place that they're needed uh to the backbone and back. And

06:38

the crazy thing is that uh those same files or the same data that represents those files might be lying around in the same local area network that you're in. So a really like a computer right next to yours could be serving you that file. Uh now there's you know a lot of reasons

06:56

why we haven't done this historically.

6:56

Bandwidth

06:58

Uh but it maybe maybe may start to be a good idea. That was kind of intense. Uh so while we're talking about bandwidth let's look at something else. So and kind of why this is a problem. So this is data from aime and uh you know it's a graph from 2007 to 2012. uh we could

07:14

update it but uh this is kind of an older graph and in that period of 5 years uh bandwidth only improved about one or two megabits per second in the G7 and that's you know the seven uh largest economy so that is not a very significant improvement when you compare that to you know the

07:31

the way that our storage is increasing right we now have um 10 terabyte hard drives we we had one terabyte a few years ago uh we're going going to have 100 terabytes in and a few more. I mean it's it's the cost is doubling uh you know halfing every 11 months and we're

07:50

now looking at big numbers. So that means uh that we want to be we're saturating these pipes and the bandwidth from the local area networks to the backbone is really really uh really small compared to how much we want to use them. Actually the problem is worse because

08:06

the the rate of improvement on the on the speed of connections of the average average connection it's actually improving at a slower rate than storage. So that means that people are getting larger capacity drives uh faster than they are getting better bandwidth which

08:23

gives you the impression that things are getting slower. uh because in a sense you're you're you want to use more of the network. Uh there's also another problem which is latency and I mean we we've all

8:34

Latency

08:36

known for a long time that we can't get around the fact that the speed of light is remains constant. Uh so the only way to make things faster is by moving them closer to you. This is why the you know Amazon and Google have offer these cloud services that you can hire to store a

08:51

whole bunch of stuff right next to um right next to where it's needed the most. And this works pretty well uh except that you know sometimes even maybe sometimes people don't have things deployed in those locations or uh you know even that latency uh out to you

09:09

know through these slow pipes um to that data center uh is too too much. I mean maybe what you need is like a the file that you need is literally like in another device in the same room and instead like you're you're piping the data through the backbone. like making a

09:25

request out to the network, grabbing the file and and then pulling it down instead of just talking to the other device that you have locally. And this is kind of an absurd thing that uh that we do this and um uh you know kind of a funny anecdote. I was giving a talk once recently about

09:41

IPFS and uh the slides that I wanted to present uh were stuck in my computer and my computer couldn't talk to the projector because I we didn't have the right adapter and getting my slides from my computer to another person's computer required because nobody had a USB key

09:53

required sending the slides all the way up to the backbone and then back down. Uh and you know like it it didn't actually most people would have had to do that like you we actually just drop into the the terminal and and just connect to the specific computer

10:09

directly and so on. But you know this is the kind of stuff that we're dealing with like most the average person doesn't know how to do any of that. Uh there's another big issue here and that's the the dichotomy between online

10:19

Online vs Offline

10:20

and offline operation right so we program uh we as engineers are sort of misusing the web because we program behind this model of saying you know we have this data center and if the user can talk to the data center then they're all nine and if they can't they're

10:36

offline and this is actually a pretty bad model and in increasingly it's becoming uh more and uh you know we had this perception that it was going to uh more and more true as in you know we're going to connect everyone and everyone's going to be online all the

10:50

time. But that's actually not um accurate because we have ever more devices that we carry around and contexts which we're using these devices that are not close to any kind of u of network that can uh move us. So imagine that you go on a plane and you're

11:07

traveling uh and you have your laptop and you have your phone and maybe you have some files in one and not in the other and now like moving those around like most applications that you have are not going to do that. In fact certainly most applications that use through the

11:18

web are certainly not going to load. They're not even going to load. Most things uh just cease operation the moment that you step outside of the bounds of the network. And you know uh sort of to give to illustrate how this works like you know imagine uh you have again this like

11:33

massive classroom but this this classroom is great it just so shows so much of these is so many of these issues right um you know say that I I uh open a Google doc or something and I send it out to everyone else. So, everyone opens this Google doc and now we're all

11:48

collaborating, but all of these updates are getting shipped out to the to the backbone uh you know to some Google server and then shipped all the way back. Uh and so that's how we're collaborating. So, by the way, we're seeing a lot of latency there because

12:01

every single time you type something, it has to go out there and then back instead of right next to you. You could be literally sitting next to the person and they still have to go go out there. And we've kind of hidden behind the fact that you know human humans don't

12:12

perceive that much latency and that you know if you get um you know around 300 400 milliseconds like you're barely going to notice it. Um but you know say that something bad happens to the rest of the network and this whole room loses connectivity to the backbone. Suddenly

12:27

the entire application comes crashing down and nobody can do anything. So maybe they can keep editing things locally, but they cannot ship the updates that they're making to the person right next to them. And this is absurd, right? I mean, uh, talk to any

12:43

person who's using these applications. They think it's it's it's crazy um that they can't the data from one computer can't get to the other computer, which is right next to it. And in reality, these these app these computers are actually talking to each other. They're

12:57

on the same network. are probably pinging each other uh or you know they're seeing each other's packets flowing through they we just haven't taught our applications how to have them talk about these files and this problem is all over the web you know I I'm not

13:13

picking Google here like there's look at tons of applications that we use day-to-day and like run our lives are uh sees operation uh entirely uh in fact actually Google's one of the best on this uh they have this awesome technology called operational transforms

13:27

that allows these updates to you know to be done concurrently and like to give you this amazing impression that uh it's working all flawlessly uh and you know they all get applied in in the backbone and uh you can be editing the same document and it just works flawlessly.

13:42

They could be sending those updates to each other through you know things like WebRTC and and so on but as far as I know that that doesn't happen and it certainly doesn't happen in most applications that you see in the web. Uh and this is not a a flaw of the

13:55

protocols uh themselves in that we sort of could be could be using WebRTC nowadays but it's actually a flaw in how we store data and how we reference data. So, it's how HTTP has taught us to store data on the web and reference it. And you know, since the beginning of HTTP, like W3C

14:16

has actually come up with a whole bunch of new ways to reference data that are better. Um, but people haven't really adopted them in the web. Now, uh, to me, this this set of problems kind of feels very silly, right? Like you have this massive set of computers somewhere and

14:30

then you take out the mothership and everything grinds to a halt. And most people think this is fine, but in reality, we enter terrible bandwidth uh problems. We we run into these issues with uh being somewhere between the online offline spectrum. You know, maybe

14:46

we're in like very low bandwidth setting or we have a bunch of interference or there's congestion or maybe you're traveling like I mentioned or your ISP has intermittent outages. This happens to me all the time. Uh and maybe the data center has some problem, right? So

15:01

many issues could be occurring and users can't um be forced to to you know halt their operation just because there's some uh discontinuity between them and the backbone. So our applications should learn how to operate in entirely distributed settings. Uh and it's we as

15:19

engineers need to start doing this and we need to build better tools for application developers to do this. Now it it's actually really important, right? So you might think that I'm that I'm uh just talking about uh these problems that we want to improve, but

15:33

they really really matter. And they matter so much that if you ask any Egyptian about the time that the internet shut down because their government decided it they were not going to allow communications, they were they are going to tell you very seriously that these

15:49

applications either potentially save their lives or their families lives. So these are not only mission critical to businesses, they're mission critical to human human lives uh potentially at risk. So these particular communication things need to be able to speak in uh to

16:06

each other when disconnected from the backbone. We need to be able to to uh talk have our computers talk to each other and we need to be able to do so

16:13

Security

16:15

securely. Right? So we we've seen this you know in the last few years we've seen like the great failing of um of of our community in terms of securing everything uh you know we've we've had uh major breaches described uh you know by Snowden and uh so on and uh you know

16:36

there's there's all sorts of problems with how we're moving around data on the web uh and it's really not enough to encrypt the communications uh you know a few I think it was a couple years ago or maybe not that long uh you know Dropbox had this huge data problem where they

16:53

allowed anyone to log in for uh something like four hours and anybody could log into your account and look at your files and that was you know Dropbox is full of great engineers and this is kind of like you know if they are not getting it perfectly right imagine how

17:06

many people are just getting it totally wrong. Uh so we need to be encrypting uh everything or or really looking carefully at how we're doing security uh on the client side and treating the cloud as just um as much as we possibly can treating the cloud as just oblivious

17:23

uh oblivious storage or oblivious uh routing systems. And uh you know this is this of course complicates everything but you know certain kinds of applications or certain kinds of context for applications uh demand that kind of attention. Now, uh, the last one I want

17:37

Permanence

17:39

to talk about is permanence. And this one is really, really deeply important to me. And it might not be as obvious to everyone why this matters. Um, but, you know, I'll try my best to to illustrate it. So, think about book burning for a moment. Think about the

17:57

most important uh piece of media that you can think about like uh the most important book that you have ever read or you know the you know think about all of Wikipedia and imagine someone coming along and burning it and really destroying it making it impossible for

18:12

anyone to read ever again. Think about all the knowledge that is lost. Now, historically, we've treated book burning as this insane, crazy uh offense to to progress and to into humanity itself. And we've condemned anyone who's gone uh to to do such a thing. And you

18:32

know, it's kind of uh you know, when you look in throughout history and you look at uh you know, the occurrences of book burning uh they drop off uh significantly with the advent of the printing press. This is because the printing press allowed you to make many

18:49

copies of the same book very cheaply. So, you know, say that you print a whole bunch of copies uh and book burners show up and try to burn a lot of bucks. Well, they're probably only going to be able to get to a few of them. Uh or even if you they get most uh there's probably

19:03

copies that are going to survive and you'll be able to make more copies. So, this is great. Now, we have some problems today because, you know, if you've been around the web, you know, bringing this back to the digital world, if you've been around the web, you've

19:16

probably seen a 404, right? A 404 is an error that tells you that some resource is not found. What that means is that there was a link pointing to another object and that object is either no longer there because someone took it down, I burned the book, or someone

19:30

moved it. And you know, we we tend to to talk um we tend to to chastise a lot of a lot of uh uh agencies and so on for for uh you know, taking certain content down or censoring and so on. But we forget that there's a a tiny little book burnings happening constantly. Whenever

19:50

any web developer moves some content from one location to another, any link that anyone had added to that location is now broken. uh and will it is you know potentially findable through search but not going to be that link is now going not no no no no longer going to

20:07

work um potentially uh you know because we're we we have the system where both the publisher of content has to host it uh or you know make sure that it's hosted somewhere the uh the consumer of the content uh must depend on that that producer uh keeping that content up or

20:27

they have to copy it and move it somewhere else and give it a new address. Right? So, this is again like the the this is this is strongly related to the location addressing problem and is sort of behind this. This is why links break um because people end up

20:40

being careless and moving things around. Uh and you know really thinking of the web of documents as uh in the abstract you know kind of ignores the fact that that really the web is not just a web of documents. It's a web of documents on machines and that the notion or

20:57

description of a document identifies the set of machines that are responsible for giving up and this prevents people from replicating or being able to uh uh host the same content somewhere else. Uh you know this is kind of like a book burner's paradise. Uh and you know

21:13

I'm describing this like accidental book burning that happens. Now fortunately someone uh you know a group of people very smart uh saw that this was a pretty big problem uh early on and into the web's history and started trying to archive everything. Uh and of course I'm

21:30

talking about the internet archive and it looks like this. It's beautiful. If you haven't been there you you probably should. Um and it they are running this project to try and index the entire web and store it uh store every single page that they can possibly uh find and store

21:45

it because they know that stuff will be needed in the future. Uh ironically uh I was trying to find uh the source code for one of the one of the protocols that that we um learned a lot from and used to develop uh IPFS. And funnily enough, the source code is only available

22:07

through the internet archive because uh people that were hosting the code no longer are hosting it or you know maybe there was some glitch in the server or whatever. We could only find it through the way way back machine which is this gray service that they

22:19

run. Uh this is actually very closely related to another important problem uh described by Vince Surf who's one of the creators of the internet. Uh and that's this notion of digital vellum and uh think of old computers you know as as technology evolves and gets better with

22:38

time we stop using the old things and so much so that we stop being able to give them maintenance and sometimes they break and when they break nobody knows how to fix them or nobody has the parts to fix them. Uh so these old machines uh that no longer work may have been

22:55

readers of important media. So things that we stored in some some physical material uh that the this computer or this machine was going to read out to us and now the machine's broken and nobody knows how to fix it. That means that all of that content is lost completely lost

23:11

unless we find a way to fix it. And so you know the solution to this is really learn to emulate everything right. So we as a society should be very careful with uh the types of media that we store and how we store them to make sure that uh you know if even if we stop using

23:26

something uh or some major problem happens we can emulate all of these computers or machines uh you know that we're old to try and read the encodings or or you know tell like be able to read out this media. And so we we need to be able to to simulate or emulate uh every

23:45

single computer that we've ever built. And you know I I this is again related to this to this this bug burning problem because say that we do this say that we we create all these emulators where do we host them? That location might be taken down you know. So

24:01

um we we uh you know might be taken down or or maybe might lose funding or whatever. We need to be able to replicate the these systems and store them in as many places as we possibly can. uh and we need to build tools and make the internet uh capable of of uh

24:16

doing this very easily. So these are the the problems that have driven us to think through how the web works, think through how content moves around the internet and come up with a solution. And you know this this isn't designed sort of in the abstract. We're actually um

24:35

we thought for a long time and we thought through uh many different kinds of attempts and and and solutions to this problem to try and synthesize a bunch of good ideas that work well together and provide a some new software that uh so both a pro protocol and tool

24:52

set uh that people can use uh to make the web better.

Word Count: 5415Character count: 24791

Autoscroll

Filter Profanity

Insights

Unlock powerful AI-powered insights from video transcripts

Smart Summaries

Key points & TLDR

Sentiment Analysis

Tone & bias detection

Topic Extraction

Tags & hashtags

Question Answers

Direct answers

AI insights are free for all users

Actions