IPFS Alpha | Why We Must Distribute The Web
People & Blogs
25m 6s
41m 39s
Channel ID:
Video ID:
Free Credits
Upgrade
0 of 25 usedResets in 27 days
00:05
Uh the internet today has been described
00:07
00:07
as this big nervous system for humanity,
00:09
00:09
right? Uh so much of what we do today is
00:11
00:11
carried through these digital pipes. Um
00:14
00:14
you so much of our economy is just run
00:17
00:17
entirely uh on the internet. Now uh and
00:21
00:21
we interface with the internet and with
00:24
00:24
each other uh through a series of
00:26
00:26
applications uh and these run mostly on
00:28
00:28
the web.
00:29
00:29
Uh the web is uh sort of like the
00:33
00:33
transport protocol for how these
00:35
00:35
applications move around. It's a way of
00:36
00:36
being able to deploy
00:38
00:38
um different uh media uh and to be able
00:42
00:42
to move to to link people from one set
00:45
00:45
of uh things to another uh to put put
00:47
00:47
people in applications uh and to give
00:48
00:48
people uh functionality that they didn't
00:51
00:51
previously have. So it's like the page
00:53
00:53
just uh loads and you retrieve some
00:54
00:54
function new functionality you didn't
00:56
00:56
have and you now can access um uh new
00:59
00:59
things right so this was a huge
01:01
01:01
breakthrough of course uh but it has
01:03
01:03
some problems today uh now the crazy
01:06
01:06
thing about the internet the web is that
01:07
01:07
it's just a collection of protocols
01:09
01:09
right so it's a bunch of really good
01:11
01:11
ideas of how to do things that have
01:13
01:13
worked things that haven't uh that get
01:14
01:14
standardized put into uh put into
01:16
01:16
browsers and put into all their tools
01:17
01:17
into computers and then this orchestra
01:20
01:20
of programs runs uh and gives us this
01:24
01:24
amazing range of capabilities, right? So
01:26
01:26
everything that we as humans can do
01:27
01:27
today, you know, be able to learn
01:29
01:29
remotely, be able to uh think together,
01:32
01:32
be able to talk to uh a person around
01:35
01:35
the world, um happens thanks to these
01:38
01:38
great protocols that have been written
01:40
01:40
by by people that really were just
01:42
01:42
aiming to augment their own experiences
01:44
01:44
and through that ended up uh augmenting
01:48
01:48
uh what the rest of humanity can do. So
01:51
01:51
this is kind of mind-blowing, right?
01:53
01:53
Because it means that if you find a
01:54
01:54
problem with the internet or uh you have
01:56
01:56
some new ideas about what humanity
01:58
01:58
should uh be able to do, you can just
02:01
02:01
write a protocol uh and then you
02:04
02:04
implement it and if you're right uh and
02:06
02:06
it works then you tell the world and
02:08
02:08
then it gets deployed and then a lot of
02:10
02:10
people will use it and the world will be
02:11
02:11
a better place. So it's it's a great
02:13
02:13
place to be. Now uh we have found some
02:17
02:17
problems uh that we're not telling you
02:18
02:18
about and they mostly have to do with
02:20
02:20
how we move uh applications uh through
02:23
02:23
the web. So
02:24
02:24
it's moving not just applications but
02:27
02:27
any kind of media. So you know documents
02:29
02:29
images uh pictures video and so on. So
02:34
02:34
IPFS is mostly related to HTTP and it
02:36
02:36
sort of enhanced HTP and should be used
02:38
02:38
alongside it um or maybe as a shim uh
02:41
02:41
and potentially in some cases like
02:42
02:42
actually just transition over to to
02:45
02:45
using IPFS instead. Now uh the big
2:49

Location Addressing

02:50
problem behind this change or what we're trying to to solve here is the problem of using location addressing. So if you've looked at a URL uh you you have this first part of the of the URL that's the domain right uh and that you know here it's exampample.com
03:09
the domain is resolved to an IP address and an IP address means the the set of numbers that you need to dial uh to get connected to another computer uh across the network. So what does that picture look like? Say that you're here highlighted in blue and you're trying to
03:28
access this picture uh through the internet and you have an address and this address has uh you know these numbers and then a path for the for the actual file. That means that that you're going to find a specific other computer at that address and fetch the image from
03:45
that computer. Now suppose that all of the other computers pictured here have the exact same file locally, including one that's very close to you in the network perhaps. Maybe it's in the same room, but it doesn't matter. Uh, as far as HTP is concerned, that is not the
03:58
same file and you have to go across the world to 10.20.30.40 uh to find it. Actually, that's uh probably won't resolve because 10 is a local address. But anyway, um the issue here is that we are addressing uh content by location. So, we're telling you where to find something
04:21
instead of what it is. And a lot of people have talked about this as a problem and come up with solutions uh but it hasn't quite uh quite sunk in. And so to to drive uh more the point here and and to show you why this is actually a huge problem today uh picture
04:38
a big room of people and it's filled with people with a bunch of computers and you know nowadays we carry a laptop and maybe a mobile phone and soon watch and you know we'll have a bunch of devices you know say that I that I um upload a picture to to Facebook and I
04:54
give everyone the link and now a whole bunch of people are going to go to Facebook and fetch that image from Facebook all the way back. So, you know, not that big of a deal, right? Well, let's look at how it looks in the in the backbone. So, I make a request to
05:10
Facebook and I send it up and you know, say it's just a 1 megabyte image times 8 here because there's a set of say that there's eight links. So, it's a total amount of bandwidth, you know, this is these are really rough calculations just to give you a picture. Say that there's
05:23
8 megaby of bandwidth uh used by my picture going all the way up to Facebook. And when 30 people show up, you know, say they're all in the same room and they all talk to Facebook and all pull down the image. Now, that's 240 megabytes of bandwidth wasted. And, you
05:40
know, you might say, well, you know, that's unavoidable because we have to encrypt everything. And um and you know, that's true. We have encrypted everything. But, you know, it's, you know, we we have to ship the image 30 times across the wire. It doesn't matter
05:53
if it's wasted. Like, you just have to send it. Maybe that's true. Maybe that isn't true. Uh, and you know what? Well, maybe 240 megabytes, it's not a big deal, right? But what if we're looking at video instead? So, imagine that you you go to YouTube and you start watching
06:06
a video and it's, you know, in high def. So, it says it's like about 200 megabytes and, you know, you send it to everybody else and then like these 30 people start downloading a 200 megabyte video across these eight links. Now, we're talking about 48 GB of bandwidth.
06:22
That's starting to be a lot. uh when you look at bigger files or like you know longer videos and so on, we're looking at potentially you know terabytes of bandwidth needed just to move these files around from the place that they're needed uh to the backbone and back. And
06:38
the crazy thing is that uh those same files or the same data that represents those files might be lying around in the same local area network that you're in. So a really like a computer right next to yours could be serving you that file. Uh now there's you know a lot of reasons
06:56
why we haven't done this historically.
6:56

Bandwidth

06:58
Uh but it maybe maybe may start to be a good idea. That was kind of intense. Uh so while we're talking about bandwidth let's look at something else. So and kind of why this is a problem. So this is data from aime and uh you know it's a graph from 2007 to 2012. uh we could
07:14
update it but uh this is kind of an older graph and in that period of 5 years uh bandwidth only improved about one or two megabits per second in the G7 and that's you know the seven uh largest economy so that is not a very significant improvement when you compare that to you know the
07:31
the way that our storage is increasing right we now have um 10 terabyte hard drives we we had one terabyte a few years ago uh we're going going to have 100 terabytes in and a few more. I mean it's it's the cost is doubling uh you know halfing every 11 months and we're
07:50
now looking at big numbers. So that means uh that we want to be we're saturating these pipes and the bandwidth from the local area networks to the backbone is really really uh really small compared to how much we want to use them. Actually the problem is worse because
08:06
the the rate of improvement on the on the speed of connections of the average average connection it's actually improving at a slower rate than storage. So that means that people are getting larger capacity drives uh faster than they are getting better bandwidth which
08:23
gives you the impression that things are getting slower. uh because in a sense you're you're you want to use more of the network. Uh there's also another problem which is latency and I mean we we've all
8:34

Latency

08:36
known for a long time that we can't get around the fact that the speed of light is remains constant. Uh so the only way to make things faster is by moving them closer to you. This is why the you know Amazon and Google have offer these cloud services that you can hire to store a
08:51
whole bunch of stuff right next to um right next to where it's needed the most. And this works pretty well uh except that you know sometimes even maybe sometimes people don't have things deployed in those locations or uh you know even that latency uh out to you
09:09
know through these slow pipes um to that data center uh is too too much. I mean maybe what you need is like a the file that you need is literally like in another device in the same room and instead like you're you're piping the data through the backbone. like making a
09:25
request out to the network, grabbing the file and and then pulling it down instead of just talking to the other device that you have locally. And this is kind of an absurd thing that uh that we do this and um uh you know kind of a funny anecdote. I was giving a talk once recently about
09:41
IPFS and uh the slides that I wanted to present uh were stuck in my computer and my computer couldn't talk to the projector because I we didn't have the right adapter and getting my slides from my computer to another person's computer required because nobody had a USB key
09:53
required sending the slides all the way up to the backbone and then back down. Uh and you know like it it didn't actually most people would have had to do that like you we actually just drop into the the terminal and and just connect to the specific computer
10:09
directly and so on. But you know this is the kind of stuff that we're dealing with like most the average person doesn't know how to do any of that. Uh there's another big issue here and that's the the dichotomy between online
10:19

Online vs Offline

10:20
and offline operation right so we program uh we as engineers are sort of misusing the web because we program behind this model of saying you know we have this data center and if the user can talk to the data center then they're all nine and if they can't they're
10:36
offline and this is actually a pretty bad model and in increasingly it's becoming uh more and uh you know we had this perception that it was going to uh more and more true as in you know we're going to connect everyone and everyone's going to be online all the
10:50
time. But that's actually not um accurate because we have ever more devices that we carry around and contexts which we're using these devices that are not close to any kind of u of network that can uh move us. So imagine that you go on a plane and you're
11:07
traveling uh and you have your laptop and you have your phone and maybe you have some files in one and not in the other and now like moving those around like most applications that you have are not going to do that. In fact certainly most applications that use through the
11:18
web are certainly not going to load. They're not even going to load. Most things uh just cease operation the moment that you step outside of the bounds of the network. And you know uh sort of to give to illustrate how this works like you know imagine uh you have again this like
11:33
massive classroom but this this classroom is great it just so shows so much of these is so many of these issues right um you know say that I I uh open a Google doc or something and I send it out to everyone else. So, everyone opens this Google doc and now we're all
11:48
collaborating, but all of these updates are getting shipped out to the to the backbone uh you know to some Google server and then shipped all the way back. Uh and so that's how we're collaborating. So, by the way, we're seeing a lot of latency there because
12:01
every single time you type something, it has to go out there and then back instead of right next to you. You could be literally sitting next to the person and they still have to go go out there. And we've kind of hidden behind the fact that you know human humans don't
12:12
perceive that much latency and that you know if you get um you know around 300 400 milliseconds like you're barely going to notice it. Um but you know say that something bad happens to the rest of the network and this whole room loses connectivity to the backbone. Suddenly
12:27
the entire application comes crashing down and nobody can do anything. So maybe they can keep editing things locally, but they cannot ship the updates that they're making to the person right next to them. And this is absurd, right? I mean, uh, talk to any
12:43
person who's using these applications. They think it's it's it's crazy um that they can't the data from one computer can't get to the other computer, which is right next to it. And in reality, these these app these computers are actually talking to each other. They're
12:57
on the same network. are probably pinging each other uh or you know they're seeing each other's packets flowing through they we just haven't taught our applications how to have them talk about these files and this problem is all over the web you know I I'm not
13:13
picking Google here like there's look at tons of applications that we use day-to-day and like run our lives are uh sees operation uh entirely uh in fact actually Google's one of the best on this uh they have this awesome technology called operational transforms
13:27
that allows these updates to you know to be done concurrently and like to give you this amazing impression that uh it's working all flawlessly uh and you know they all get applied in in the backbone and uh you can be editing the same document and it just works flawlessly.
13:42
They could be sending those updates to each other through you know things like WebRTC and and so on but as far as I know that that doesn't happen and it certainly doesn't happen in most applications that you see in the web. Uh and this is not a a flaw of the
13:55
protocols uh themselves in that we sort of could be could be using WebRTC nowadays but it's actually a flaw in how we store data and how we reference data. So, it's how HTTP has taught us to store data on the web and reference it. And you know, since the beginning of HTTP, like W3C
14:16
has actually come up with a whole bunch of new ways to reference data that are better. Um, but people haven't really adopted them in the web. Now, uh, to me, this this set of problems kind of feels very silly, right? Like you have this massive set of computers somewhere and
14:30
then you take out the mothership and everything grinds to a halt. And most people think this is fine, but in reality, we enter terrible bandwidth uh problems. We we run into these issues with uh being somewhere between the online offline spectrum. You know, maybe
14:46
we're in like very low bandwidth setting or we have a bunch of interference or there's congestion or maybe you're traveling like I mentioned or your ISP has intermittent outages. This happens to me all the time. Uh and maybe the data center has some problem, right? So
15:01
many issues could be occurring and users can't um be forced to to you know halt their operation just because there's some uh discontinuity between them and the backbone. So our applications should learn how to operate in entirely distributed settings. Uh and it's we as
15:19
engineers need to start doing this and we need to build better tools for application developers to do this. Now it it's actually really important, right? So you might think that I'm that I'm uh just talking about uh these problems that we want to improve, but
15:33
they really really matter. And they matter so much that if you ask any Egyptian about the time that the internet shut down because their government decided it they were not going to allow communications, they were they are going to tell you very seriously that these
15:49
applications either potentially save their lives or their families lives. So these are not only mission critical to businesses, they're mission critical to human human lives uh potentially at risk. So these particular communication things need to be able to speak in uh to
16:06
each other when disconnected from the backbone. We need to be able to to uh talk have our computers talk to each other and we need to be able to do so
16:13

Security

16:15
securely. Right? So we we've seen this you know in the last few years we've seen like the great failing of um of of our community in terms of securing everything uh you know we've we've had uh major breaches described uh you know by Snowden and uh so on and uh you know
16:36
there's there's all sorts of problems with how we're moving around data on the web uh and it's really not enough to encrypt the communications uh you know a few I think it was a couple years ago or maybe not that long uh you know Dropbox had this huge data problem where they
16:53
allowed anyone to log in for uh something like four hours and anybody could log into your account and look at your files and that was you know Dropbox is full of great engineers and this is kind of like you know if they are not getting it perfectly right imagine how
17:06
many people are just getting it totally wrong. Uh so we need to be encrypting uh everything or or really looking carefully at how we're doing security uh on the client side and treating the cloud as just um as much as we possibly can treating the cloud as just oblivious
17:23
uh oblivious storage or oblivious uh routing systems. And uh you know this is this of course complicates everything but you know certain kinds of applications or certain kinds of context for applications uh demand that kind of attention. Now, uh, the last one I want
17:37

Permanence

17:39
to talk about is permanence. And this one is really, really deeply important to me. And it might not be as obvious to everyone why this matters. Um, but, you know, I'll try my best to to illustrate it. So, think about book burning for a moment. Think about the
17:57
most important uh piece of media that you can think about like uh the most important book that you have ever read or you know the you know think about all of Wikipedia and imagine someone coming along and burning it and really destroying it making it impossible for
18:12
anyone to read ever again. Think about all the knowledge that is lost. Now, historically, we've treated book burning as this insane, crazy uh offense to to progress and to into humanity itself. And we've condemned anyone who's gone uh to to do such a thing. And you
18:32
know, it's kind of uh you know, when you look in throughout history and you look at uh you know, the occurrences of book burning uh they drop off uh significantly with the advent of the printing press. This is because the printing press allowed you to make many
18:49
copies of the same book very cheaply. So, you know, say that you print a whole bunch of copies uh and book burners show up and try to burn a lot of bucks. Well, they're probably only going to be able to get to a few of them. Uh or even if you they get most uh there's probably
19:03
copies that are going to survive and you'll be able to make more copies. So, this is great. Now, we have some problems today because, you know, if you've been around the web, you know, bringing this back to the digital world, if you've been around the web, you've
19:16
probably seen a 404, right? A 404 is an error that tells you that some resource is not found. What that means is that there was a link pointing to another object and that object is either no longer there because someone took it down, I burned the book, or someone
19:30
moved it. And you know, we we tend to to talk um we tend to to chastise a lot of a lot of uh uh agencies and so on for for uh you know, taking certain content down or censoring and so on. But we forget that there's a a tiny little book burnings happening constantly. Whenever
19:50
any web developer moves some content from one location to another, any link that anyone had added to that location is now broken. uh and will it is you know potentially findable through search but not going to be that link is now going not no no no no longer going to
20:07
work um potentially uh you know because we're we we have the system where both the publisher of content has to host it uh or you know make sure that it's hosted somewhere the uh the consumer of the content uh must depend on that that producer uh keeping that content up or
20:27
they have to copy it and move it somewhere else and give it a new address. Right? So, this is again like the the this is this is strongly related to the location addressing problem and is sort of behind this. This is why links break um because people end up
20:40
being careless and moving things around. Uh and you know really thinking of the web of documents as uh in the abstract you know kind of ignores the fact that that really the web is not just a web of documents. It's a web of documents on machines and that the notion or
20:57
description of a document identifies the set of machines that are responsible for giving up and this prevents people from replicating or being able to uh uh host the same content somewhere else. Uh you know this is kind of like a book burner's paradise. Uh and you know
21:13
I'm describing this like accidental book burning that happens. Now fortunately someone uh you know a group of people very smart uh saw that this was a pretty big problem uh early on and into the web's history and started trying to archive everything. Uh and of course I'm
21:30
talking about the internet archive and it looks like this. It's beautiful. If you haven't been there you you probably should. Um and it they are running this project to try and index the entire web and store it uh store every single page that they can possibly uh find and store
21:45
it because they know that stuff will be needed in the future. Uh ironically uh I was trying to find uh the source code for one of the one of the protocols that that we um learned a lot from and used to develop uh IPFS. And funnily enough, the source code is only available
22:07
through the internet archive because uh people that were hosting the code no longer are hosting it or you know maybe there was some glitch in the server or whatever. We could only find it through the way way back machine which is this gray service that they
22:19
run. Uh this is actually very closely related to another important problem uh described by Vince Surf who's one of the creators of the internet. Uh and that's this notion of digital vellum and uh think of old computers you know as as technology evolves and gets better with
22:38
time we stop using the old things and so much so that we stop being able to give them maintenance and sometimes they break and when they break nobody knows how to fix them or nobody has the parts to fix them. Uh so these old machines uh that no longer work may have been
22:55
readers of important media. So things that we stored in some some physical material uh that the this computer or this machine was going to read out to us and now the machine's broken and nobody knows how to fix it. That means that all of that content is lost completely lost
23:11
unless we find a way to fix it. And so you know the solution to this is really learn to emulate everything right. So we as a society should be very careful with uh the types of media that we store and how we store them to make sure that uh you know if even if we stop using
23:26
something uh or some major problem happens we can emulate all of these computers or machines uh you know that we're old to try and read the encodings or or you know tell like be able to read out this media. And so we we need to be able to to simulate or emulate uh every
23:45
single computer that we've ever built. And you know I I this is again related to this to this this bug burning problem because say that we do this say that we we create all these emulators where do we host them? That location might be taken down you know. So
24:01
um we we uh you know might be taken down or or maybe might lose funding or whatever. We need to be able to replicate the these systems and store them in as many places as we possibly can. uh and we need to build tools and make the internet uh capable of of uh
24:16
doing this very easily. So these are the the problems that have driven us to think through how the web works, think through how content moves around the internet and come up with a solution. And you know this this isn't designed sort of in the abstract. We're actually um
24:35
we thought for a long time and we thought through uh many different kinds of attempts and and and solutions to this problem to try and synthesize a bunch of good ideas that work well together and provide a some new software that uh so both a pro protocol and tool
24:52
set uh that people can use uh to make the web better.
Word Count: 5415Character count: 24791

Insights

Unlock powerful AI-powered insights from video transcripts

Smart Summaries
Key points & TLDR
Sentiment Analysis
Tone & bias detection
Topic Extraction
Tags & hashtags
Question Answers
Direct answers

AI insights are free for all users

Actions
Chat is currently free in Beta