IPFS Alpha | Why We Must Distribute The Web
People & Blogs
25m 6s
41m 39s
Channel ID:
Video ID:
Free Credits
Upgrade0 of 25 usedResets in 27 days
00:05
Uh the internet today has been described
00:07
00:07
as this big nervous system for humanity,
00:09
00:09
right? Uh so much of what we do today is
00:11
00:11
carried through these digital pipes. Um
00:14
00:14
you so much of our economy is just run
00:17
00:17
entirely uh on the internet. Now uh and
00:21
00:21
we interface with the internet and with
00:24
00:24
each other uh through a series of
00:26
00:26
applications uh and these run mostly on
00:28
00:28
the web.
00:29
00:29
Uh the web is uh sort of like the
00:33
00:33
transport protocol for how these
00:35
00:35
applications move around. It's a way of
00:36
00:36
being able to deploy
00:38
00:38
um different uh media uh and to be able
00:42
00:42
to move to to link people from one set
00:45
00:45
of uh things to another uh to put put
00:47
00:47
people in applications uh and to give
00:48
00:48
people uh functionality that they didn't
00:51
00:51
previously have. So it's like the page
00:53
00:53
just uh loads and you retrieve some
00:54
00:54
function new functionality you didn't
00:56
00:56
have and you now can access um uh new
00:59
00:59
things right so this was a huge
01:01
01:01
breakthrough of course uh but it has
01:03
01:03
some problems today uh now the crazy
01:06
01:06
thing about the internet the web is that
01:07
01:07
it's just a collection of protocols
01:09
01:09
right so it's a bunch of really good
01:11
01:11
ideas of how to do things that have
01:13
01:13
worked things that haven't uh that get
01:14
01:14
standardized put into uh put into
01:16
01:16
browsers and put into all their tools
01:17
01:17
into computers and then this orchestra
01:20
01:20
of programs runs uh and gives us this
01:24
01:24
amazing range of capabilities, right? So
01:26
01:26
everything that we as humans can do
01:27
01:27
today, you know, be able to learn
01:29
01:29
remotely, be able to uh think together,
01:32
01:32
be able to talk to uh a person around
01:35
01:35
the world, um happens thanks to these
01:38
01:38
great protocols that have been written
01:40
01:40
by by people that really were just
01:42
01:42
aiming to augment their own experiences
01:44
01:44
and through that ended up uh augmenting
01:48
01:48
uh what the rest of humanity can do. So
01:51
01:51
this is kind of mind-blowing, right?
01:53
01:53
Because it means that if you find a
01:54
01:54
problem with the internet or uh you have
01:56
01:56
some new ideas about what humanity
01:58
01:58
should uh be able to do, you can just
02:01
02:01
write a protocol uh and then you
02:04
02:04
implement it and if you're right uh and
02:06
02:06
it works then you tell the world and
02:08
02:08
then it gets deployed and then a lot of
02:10
02:10
people will use it and the world will be
02:11
02:11
a better place. So it's it's a great
02:13
02:13
place to be. Now uh we have found some
02:17
02:17
problems uh that we're not telling you
02:18
02:18
about and they mostly have to do with
02:20
02:20
how we move uh applications uh through
02:23
02:23
the web. So
02:24
02:24
it's moving not just applications but
02:27
02:27
any kind of media. So you know documents
02:29
02:29
images uh pictures video and so on. So
02:34
02:34
IPFS is mostly related to HTTP and it
02:36
02:36
sort of enhanced HTP and should be used
02:38
02:38
alongside it um or maybe as a shim uh
02:41
02:41
and potentially in some cases like
02:42
02:42
actually just transition over to to
02:45
02:45
using IPFS instead. Now uh the big
2:49
Location Addressing
02:50
problem behind this change or what we're
trying to to solve here is the problem
of using location addressing. So if
you've looked at a URL uh
you you have this first part of the of
the URL that's the domain right uh and
that you know here it's exampample.com
03:09
the domain is resolved to an IP address
and an IP address means the the set of
numbers that you need to dial uh to get
connected to another computer uh across
the network. So what does that picture
look like? Say that you're here
highlighted in blue and you're trying to
03:28
access this picture uh through the
internet and you have an address and
this address has uh you know these
numbers and then a path for the for the
actual file. That means that that you're
going to find a specific other computer
at that address and fetch the image from
03:45
that computer. Now suppose that all of
the other computers pictured here have
the exact same file locally, including
one that's very close to you in the
network perhaps. Maybe it's in the same
room, but it doesn't matter. Uh, as far
as HTP is concerned, that is not the
03:58
same file and you have to go across the
world to
10.20.30.40 uh to find it. Actually,
that's uh probably won't resolve because
10 is a local address. But anyway, um
the issue here is that we are addressing
uh content by location. So, we're
telling you where to find something
04:21
instead of what it is. And a lot of
people have talked about this as a
problem and come up with solutions uh
but it hasn't quite uh quite sunk in.
And so to to drive uh more the point
here and and to show you why this is
actually a huge problem today uh picture
04:38
a big room of people and it's filled
with people with a bunch of computers
and you know nowadays we carry a laptop
and maybe a mobile phone and soon watch
and you know we'll have a bunch of
devices you know say that I that I um
upload a picture to to Facebook and I
04:54
give everyone the link and now a whole
bunch of people are going to go to
Facebook and fetch that image from
Facebook all the way back. So, you know,
not that big of a deal, right? Well,
let's look at how it looks in the in the
backbone. So, I make a request to
05:10
Facebook and I send it up and you know,
say it's just a 1 megabyte image times 8
here because there's a set of say that
there's eight links. So, it's a total
amount of bandwidth, you know, this is
these are really rough calculations just
to give you a picture. Say that there's
05:23
8 megaby of bandwidth uh used by my
picture going all the way up to
Facebook.
And when 30 people show up, you know,
say they're all in the same room and
they all talk to Facebook and all pull
down the image. Now, that's 240
megabytes of bandwidth wasted. And, you
05:40
know, you might say, well, you know,
that's unavoidable because we have to
encrypt everything. And um and you know,
that's true. We have encrypted
everything. But, you know, it's, you
know, we we have to ship the image 30
times across the wire. It doesn't matter
05:53
if it's wasted. Like, you just have to
send it. Maybe that's true. Maybe that
isn't true. Uh, and you know what? Well,
maybe 240 megabytes, it's not a big
deal, right? But what if we're looking
at video instead? So, imagine that you
you go to YouTube and you start watching
06:06
a video and it's, you know, in high def.
So, it says it's like about 200
megabytes and, you know, you send it to
everybody else and then like these 30
people start downloading a 200 megabyte
video across these eight links. Now,
we're talking about 48 GB of bandwidth.
06:22
That's starting to be a lot. uh when you
look at bigger files or like you know
longer videos and so on, we're looking
at potentially you know terabytes of
bandwidth needed just to move these
files around from the place that they're
needed uh to the backbone and back. And
06:38
the crazy thing is that uh those same
files or the same data that represents
those files might be lying around in the
same local area network that you're in.
So a really like a computer right next
to yours could be serving you that file.
Uh now there's you know a lot of reasons
06:56
why we haven't done this historically.
6:56
Bandwidth
06:58
Uh but it maybe maybe may start to be a
good idea. That was kind of intense. Uh
so while we're talking about bandwidth
let's look at something else. So and
kind of why this is a problem. So this
is data from aime and uh you know it's a
graph from 2007 to 2012. uh we could
07:14
update it but uh this is kind of an
older graph and in that period of 5
years uh bandwidth only improved about
one or two megabits per second in the G7
and that's you know the seven uh largest
economy so that is not a very
significant improvement when you compare
that to you know the
07:31
the way that our storage is increasing
right we now have um 10 terabyte hard
drives we we had one terabyte a few
years ago uh we're going going to have
100 terabytes in and a few more. I mean
it's it's the cost is doubling uh you
know halfing every 11 months and we're
07:50
now looking at big numbers. So that
means uh that we want to be we're
saturating these pipes and the bandwidth
from the local area networks to the
backbone is really really uh really
small compared to how much we want to
use them.
Actually the problem is worse because
08:06
the the rate of improvement on the on
the speed of connections of the average
average connection it's actually
improving at a slower rate than storage.
So that means that people are getting
larger capacity drives uh faster than
they are getting better bandwidth which
08:23
gives you the impression that things are
getting slower. uh because in a sense
you're you're you want to use more of
the
network. Uh there's also another problem
which is latency and I mean we we've all
8:34
Latency
08:36
known for a long time that we can't get
around the fact that the speed of light
is remains constant. Uh so the only way
to make things faster is by moving them
closer to you. This is why the you know
Amazon and Google have offer these cloud
services that you can hire to store a
08:51
whole bunch of stuff right next to um
right next to where it's needed the
most. And this works pretty well uh
except that you know sometimes even
maybe sometimes people don't have things
deployed in those locations or uh you
know even that latency uh out to you
09:09
know through these slow pipes um to that
data center uh is too too much. I mean
maybe what you need is like a the file
that you need is literally like in
another device in the same room and
instead like you're you're piping the
data through the backbone. like making a
09:25
request out to the network, grabbing the
file and and then pulling it down
instead of just talking to the other
device that you have locally. And this
is kind of an absurd thing that uh that
we do this and um
uh you know kind of a funny anecdote. I
was giving a talk once recently about
09:41
IPFS and uh the slides that I wanted to
present uh were stuck in my computer and
my computer couldn't talk to the
projector because I we didn't have the
right adapter and getting my slides from
my computer to another person's computer
required because nobody had a USB key
09:53
required sending the slides all the way
up to the backbone and then back down.
Uh and you know like it it didn't
actually most people would have had to
do that like you we actually just drop
into the the terminal and and just
connect to the specific computer
10:09
directly and so on. But you know this is
the kind of stuff that we're dealing
with like most the average person
doesn't know how to do any of that. Uh
there's another big issue here and
that's the the dichotomy between online
10:19
Online vs Offline
10:20
and offline operation right so we
program uh we as engineers are sort of
misusing the web because we program
behind this model of saying you know we
have this data center and if the user
can talk to the data center then they're
all nine and if they can't they're
10:36
offline and this is actually a pretty
bad model and in increasingly it's
becoming uh more and uh you know we had
this perception that it was going to
uh more and more true as in you know
we're going to connect everyone and
everyone's going to be online all the
10:50
time. But that's actually not um
accurate because we have ever more
devices that we carry around and
contexts which we're using these devices
that are not close to any kind of u of
network that can uh move us. So imagine
that you go on a plane and you're
11:07
traveling uh and you have your laptop
and you have your phone and maybe you
have some files in one and not in the
other and now like moving those around
like most applications that you have are
not going to do that. In fact certainly
most applications that use through the
11:18
web are certainly not going to load.
They're not even going to load. Most
things uh just cease operation the
moment that you step outside of the
bounds of the network.
And you know uh sort of to give to
illustrate how this works like you know
imagine uh you have again this like
11:33
massive classroom but this this
classroom is great it just so shows so
much of these is so many of these issues
right um you know say that I I uh open a
Google doc or something and I send it
out to everyone else. So, everyone opens
this Google doc and now we're all
11:48
collaborating, but all of these updates
are getting shipped out to the to the
backbone uh you know to some Google
server and then shipped all the way
back. Uh and so that's how we're
collaborating. So, by the way, we're
seeing a lot of latency there because
12:01
every single time you type something, it
has to go out there and then back
instead of right next to you. You could
be literally sitting next to the person
and they still have to go go out there.
And we've kind of hidden behind the fact
that you know human humans don't
12:12
perceive that much latency and that you
know if you get um you know around 300
400 milliseconds like you're barely
going to notice it. Um but you know say
that something bad happens to the rest
of the network and this whole room loses
connectivity to the backbone. Suddenly
12:27
the entire application comes crashing
down and nobody can do anything. So
maybe they can keep editing things
locally, but they cannot ship the
updates that they're making to the
person right next to them. And this is
absurd, right? I mean, uh, talk to any
12:43
person who's using these applications.
They think it's it's it's crazy um that
they can't the data from one computer
can't get to the other computer, which
is right next to it. And in reality,
these these app these computers are
actually talking to each other. They're
12:57
on the same network. are probably
pinging each other uh or you know
they're seeing each other's packets
flowing through they we just haven't
taught our applications how to have them
talk about these files and this problem
is all over the web you know I I'm not
13:13
picking Google here like there's look at
tons of applications that we use
day-to-day and like run our lives are uh
sees operation uh entirely uh in fact
actually Google's one of the best on
this uh they have this awesome
technology called operational transforms
13:27
that allows these updates to you know to
be done concurrently and like to give
you this amazing impression that uh it's
working all flawlessly uh and you know
they all get applied in in the backbone
and uh you can be editing the same
document and it just works flawlessly.
13:42
They could be sending those updates to
each other through you know things like
WebRTC and and so on but as far as I
know that that doesn't happen and it
certainly doesn't happen in most
applications that you see in the web. Uh
and this is not a a flaw of the
13:55
protocols uh themselves in that we sort
of could
be could be using WebRTC nowadays but
it's actually a flaw in how we store
data and how we reference data. So, it's
how HTTP has taught us to store data on
the web and reference it. And you know,
since the beginning of HTTP, like W3C
14:16
has actually come up with a whole bunch
of new ways to reference data that are
better. Um, but people haven't really
adopted them in the web. Now, uh, to me,
this this set of problems kind of feels
very silly, right? Like you have this
massive set of computers somewhere and
14:30
then you take out the mothership and
everything grinds to a halt. And most
people think this is fine, but in
reality, we enter terrible bandwidth uh
problems. We we run into these issues
with uh being somewhere between the
online offline spectrum. You know, maybe
14:46
we're in like very low bandwidth setting
or we have a bunch of interference or
there's congestion or maybe you're
traveling like I mentioned or your ISP
has intermittent outages. This happens
to me all the time. Uh and maybe the
data center has some problem, right? So
15:01
many issues could be occurring and users
can't um be forced to to you know halt
their operation just because there's
some uh discontinuity between them and
the backbone. So our applications should
learn how to operate in entirely
distributed settings. Uh and it's we as
15:19
engineers need to start doing this and
we need to build better tools for
application developers to do this. Now
it it's actually really important,
right? So you might think that I'm that
I'm uh just talking about uh these
problems that we want to improve, but
15:33
they really really matter. And they
matter so much that if you ask any
Egyptian about the time that the
internet shut down because their
government decided it they were not
going to allow
communications, they were they are going
to tell you very seriously that these
15:49
applications either potentially save
their lives or their families lives. So
these are not only mission critical to
businesses, they're mission critical to
human human lives uh potentially at
risk. So these particular communication
things need to be able to speak in uh to
16:06
each other when disconnected from the
backbone. We need to be able to to uh
talk have our computers talk to each
other and we need to be able to do so
16:13
Security
16:15
securely. Right? So we we've seen this
you know in the last few years we've
seen like the great failing of um of of
our community in terms of securing
everything uh you know we've we've had
uh major breaches described uh you know
by Snowden and uh so on and uh you know
16:36
there's there's all sorts of problems
with how we're moving around data on the
web uh and it's really not enough to
encrypt the communications uh you know a
few I think it was a couple years ago or
maybe not that long uh you know Dropbox
had this huge data problem where they
16:53
allowed anyone to log in for uh
something like four hours and anybody
could log into your account and look at
your files and that was you know Dropbox
is full of great engineers and this is
kind of like you know if they are not
getting it perfectly right imagine how
17:06
many people are just getting it totally
wrong. Uh so we need to be encrypting uh
everything or or really looking
carefully at how we're doing security uh
on the client side and treating the
cloud as just um as much as we possibly
can treating the cloud as just oblivious
17:23
uh oblivious storage or oblivious uh
routing
systems. And uh you know this is this of
course complicates everything but you
know certain kinds of applications or
certain kinds of context for
applications uh demand that kind of
attention. Now, uh, the last one I want
17:37
Permanence
17:39
to talk about is permanence. And this
one is really, really deeply important
to me. And it might not be as obvious to
everyone why this matters.
Um, but, you know, I'll try my best to
to illustrate it. So, think about book
burning for a moment. Think about the
17:57
most important uh piece of media that
you can think about like uh the most
important book that you have ever read
or you know the you know think about all
of Wikipedia and imagine someone coming
along and burning it and really
destroying it making it impossible for
18:12
anyone to read ever again. Think about
all the knowledge that is lost.
Now, historically, we've treated book
burning as this insane, crazy uh offense
to to progress and to into humanity
itself. And we've condemned anyone who's
gone uh to to do such a thing. And you
18:32
know, it's kind of uh you know, when you
look in throughout history and you look
at uh you know, the occurrences of book
burning uh they drop off uh
significantly with the advent of the
printing press. This is because the
printing press allowed you to make many
18:49
copies of the same book very cheaply.
So, you know, say that you print a whole
bunch of copies uh and book burners show
up and try to burn a lot of bucks. Well,
they're probably only going to be able
to get to a few of them. Uh or even if
you they get most uh there's probably
19:03
copies that are going to survive and
you'll be able to make more copies. So,
this is great. Now, we have some
problems today because, you know, if
you've been around the web, you know,
bringing this back to the digital world,
if you've been around the web, you've
19:16
probably seen a 404, right? A 404 is an
error that tells you that some resource
is not found. What that means is that
there was a link pointing to another
object and that object is either no
longer there because someone took it
down, I burned the book, or someone
19:30
moved it. And you know, we we tend to to
talk um we tend to to chastise a lot of
a lot of uh uh agencies and so on for
for uh you know, taking certain content
down or censoring and so on. But we
forget that there's a a tiny little book
burnings happening constantly. Whenever
19:50
any web developer moves some content
from one location to another, any link
that anyone had added to that location
is now broken. uh and will it is you
know potentially findable through search
but not going to be that link is now
going not no no no no longer going to
20:07
work um potentially uh you know because
we're we we have the system where both
the publisher of content has to host it
uh or you know make sure that it's
hosted somewhere the uh the consumer of
the content uh must depend on that that
producer uh keeping that content up or
20:27
they have to copy it and move it
somewhere else and give it a new
address. Right? So, this is again like
the the this is this is strongly related
to the location addressing problem and
is sort of behind this. This is why
links break um because people end up
20:40
being careless and moving things around.
Uh and you know really thinking of the
web of documents as uh in the abstract
you know kind of ignores the fact that
that really the web is not just a web of
documents. It's a web of documents on
machines and that the notion or
20:57
description of a document identifies the
set of machines that are responsible for
giving up and this prevents people from
replicating or being able to uh uh host
the same content somewhere
else. Uh you know this is kind of like a
book burner's paradise. Uh and you know
21:13
I'm describing this like accidental book
burning that happens. Now fortunately
someone uh you know a group of people
very smart uh saw that this was a pretty
big problem uh early on and into the
web's history and started trying to
archive everything. Uh and of course I'm
21:30
talking about the internet archive and
it looks like this. It's beautiful. If
you haven't been there you you probably
should. Um and it they are running this
project to try and index the entire web
and store it uh store every single page
that they can possibly uh find and store
21:45
it because they know that stuff will be
needed in the future. Uh ironically uh I
was trying to find uh the source code
for one of the one of the protocols that
that we um learned a lot from and used
to develop uh IPFS. And funnily enough,
the source code is only available
22:07
through the internet archive because uh
people that were hosting the code no
longer are hosting it or you know maybe
there was some glitch in the server or
whatever. We could only find it through
the way way back machine which is this
gray service that they
22:19
run. Uh this is actually very closely
related to another important problem uh
described by Vince Surf who's one of the
creators of the internet. Uh and that's
this notion of digital vellum and
uh think of old computers you know as as
technology evolves and gets better with
22:38
time we stop using the old things and so
much so that we stop being able to give
them maintenance and sometimes they
break and when they break nobody knows
how to fix them or nobody has the parts
to fix them. Uh so these old machines uh
that no longer work may have been
22:55
readers of important media. So things
that we stored in some some physical
material uh that the this computer or
this machine was going to read out to us
and now the machine's broken and nobody
knows how to fix it. That means that all
of that content is lost completely lost
23:11
unless we find a way to fix it. And so
you know the solution to this is really
learn to emulate everything right. So we
as a society should be very careful with
uh the types of media that we store and
how we store them to make sure that uh
you know if even if we stop using
23:26
something uh or some major problem
happens we can emulate all of these
computers or machines uh you know that
we're old to try and read the encodings
or or you know tell like be able to read
out this media. And so we we need to be
able to to simulate or emulate uh every
23:45
single computer that we've ever built.
And you know I I this is again related
to this to this this bug burning problem
because say that we do this say that we
we create all these emulators where do
we host them? That location might be
taken down you know. So
24:01
um we we uh you know might be taken down
or or maybe might lose funding or
whatever. We need to be able to
replicate the these systems and store
them in as many places as we possibly
can. uh and we need to build tools and
make the internet uh capable of of uh
24:16
doing this very easily. So these are the
the problems that have driven us to
think through how the web works, think
through how content moves around the
internet and come up with a solution.
And you know this this isn't designed
sort of in the abstract. We're actually
um
24:35
we thought for a long time and we
thought through uh many different kinds
of attempts and and and solutions to
this problem to try and synthesize a
bunch of good ideas that work well
together and provide a some new software
that uh so both a pro protocol and tool
24:52
set uh that people can use uh to make
the web better.
Word Count: 5415Character count: 24791
Insights
Unlock powerful AI-powered insights from video transcripts
Smart Summaries
Key points & TLDR
Sentiment Analysis
Tone & bias detection
Topic Extraction
Tags & hashtags
Question Answers
Direct answers
AI insights are free for all users
Actions