sponsored links
TEDxCaFoscariU

Frederic Kaplan: How to build an information time machine

June 19, 2013

Imagine if you could surf Facebook ... from the Middle Ages. Well, it may not be as far off as it sounds. In a fun and interesting talk, researcher and engineer Frederic Kaplan shows off the Venice Time Machine, a project to digitize 80 kilometers of books to create a historical and geographical simulation of Venice across 1000 years. (Filmed at TEDxCaFoscariU.)

Frederic Kaplan - Digital humanities researcher
Frederic Kaplan seeks to digitize vast archives of historical information to make maps that move -- through time. Full bio

sponsored links
Double-click the English subtitles below to play the video.
This is an image of the planet Earth.
00:12
It looks very much like the Apollo pictures
00:14
that are very well known.
00:18
There is something different;
00:19
you can click on it,
00:21
and if you click on it,
00:23
you can zoom in on almost any place on the Earth.
00:24
For instance, this is a bird's-eye view
00:27
of the EPFL campus.
00:29
In many cases, you can also see
00:32
how a building looks from a nearby street.
00:34
This is pretty amazing.
00:38
But there's something missing in this wonderful tour:
00:39
It's time.
00:42
i'm not really sure when this picture was taken.
00:45
I'm not even sure it was taken
00:48
at the same moment as the bird's-eye view.
00:49
In my lab, we develop tools
00:55
to travel not only in space
00:57
but also through time.
00:59
The kind of question we're asking is
01:02
Is it possible to build something
01:04
like Google Maps of the past?
01:05
Can I add a slider on top of Google Maps
01:07
and just change the year,
01:10
seeing how it was 100 years before,
01:12
1,000 years before?
01:14
Is that possible?
01:16
Can I reconstruct social networks of the past?
01:18
Can I make a Facebook of the Middle Ages?
01:20
So, can I build time machines?
01:23
Maybe we can just say, "No, it's not possible."
01:27
Or, maybe, we can think of it from an information point of view.
01:29
This is what I call the information mushroom.
01:33
Vertically, you have the time.
01:36
and horizontally, the amount of digital information available.
01:38
Obviously, in the last 10 years, we have much information.
01:41
And obviously the more we go in the past, the less information we have.
01:44
If we want to build something like Google Maps of the past,
01:48
or Facebook of the past,
01:50
we need to enlarge this space,
01:52
we need to make that like a rectangle.
01:53
How do we do that?
01:55
One way is digitization.
01:57
There's a lot of material available --
01:59
newspaper, printed books, thousands of printed books.
02:01
I can digitize all these.
02:07
I can extract information from these.
02:09
Of course, the more you go in the past,
the less information you will have.
02:11
So, it might not be enough.
02:15
So, I can do what historians do.
02:17
I can extrapolate.
02:20
This is what we call, in computer science, simulation.
02:21
If I take a log book,
02:26
I can consider, it's not just a log book
02:28
of a Venetian captain going to a particular journey.
02:30
I can consider it is actually a log book
02:33
which is representative of
many journeys of that period.
02:35
I'm extrapolating.
02:37
If I have a painting of a facade,
02:39
I can consider it's not just that particular building,
02:42
but probably it also shares the same grammar
02:44
of buildings where we lost any information.
02:48
So if we want to construct a time machine,
02:52
we need two things.
02:55
We need very large archives,
02:56
and we need excellent specialists.
02:59
The Venice Time Machine,
03:01
the project I'm going to talk to you about,
03:03
is a joint project between the EPFL
03:05
and the University of Venice Ca'Foscari.
03:08
There's something very peculiar about Venice,
03:11
that its administration has been
03:13
very, very bureaucratic.
03:16
They've been keeping track of everything,
03:18
almost like Google today.
03:20
At the Archivio di Stato,
03:23
you have 80 kilometers of archives
03:25
documenting every aspect
03:27
of the life of Venice over
more than 1,000 years.
03:29
You have every boat that goes out,
03:31
every boat that comes in.
03:33
You have every change that was made in the city.
03:34
This is all there.
03:37
We are setting up a 10-year digitization program
03:40
which has the objective of transforming
03:44
this immense archive
03:45
into a giant information system.
03:47
The type of objective we want to reach
03:49
is 450 books a day that can be digitized.
03:51
Of course, when you digitize, that's not enough,
03:56
because these documents,
03:58
most of them are in Latin, in Tuscan,
03:59
in Venetian dialect,
04:02
so you need to transcribe them,
04:04
to translate them in some cases,
04:05
to index them,
04:07
and this is obviously not easy.
04:08
In particular, traditional optical
character recognition method
04:10
that can be used for printed manuscripts,
04:14
they do not work well on the handwritten document.
04:15
So the solution is actually to take inspiration
04:19
from another domain: speech recognition.
04:22
This is a domain of something
that seems impossible,
04:24
which can actually be done,
04:27
simply by putting additional constraints.
04:29
If you have a very good model
04:31
of a language which is used,
04:33
if you have a very good model of a document,
04:34
how well they are structured.
04:36
And these are administrative documents.
04:38
They are well structured in many cases.
04:39
If you divide this huge archive into smaller subsets
04:41
where a smaller subset
actually shares similar features,
04:45
then there's a chance of success.
04:48
If we reach that stage, then there's something else:
04:54
we can extract from this document events.
04:57
Actually probably 10 billion events
05:00
can be extracted from this archive.
05:02
And this giant information system
05:04
can be searched in many ways.
05:06
You can ask questions like,
05:08
"Who lived in this palazzo in 1323?"
05:09
"How much cost a sea bream at the Realto market
05:12
in 1434?"
05:14
"What was the salary
05:16
of a glass maker in Murano
05:17
maybe over a decade?"
05:19
You can ask even bigger questions
05:21
because it will be semantically coded.
05:22
And then what you can do is put that in space,
05:25
because much of this information is spatial.
05:27
And from that, you can do things like
05:29
reconstructing this extraordinary journey
05:31
of that city that managed to
have a sustainable development
05:33
over a thousand years,
05:37
managing to have all the time
05:39
a form of equilibrium with its environment.
05:40
You can reconstruct that journey,
05:43
visualize it in many different ways.
05:45
But of course, you cannot understand
Venice if you just look at the city.
05:47
You have to put it in a larger European context.
05:50
So the idea is also to document all the things
05:53
that worked at the European level.
05:55
We can reconstruct also the journey
05:58
of the Venetian maritime empire,
06:00
how it progressively controlled the Adriatic Sea,
06:02
how it became the most powerful medieval empire
06:05
of its time,
06:09
controlling most of the sea routes
06:10
from the east to the south.
06:12
But you can even do other things,
06:17
because in these maritime routes,
06:19
there are regular patterns.
06:21
You can go one step beyond
06:23
and actually create a simulation system,
06:26
create a Mediterranean simulator
06:28
which is capable actually of reconstructing
06:31
even the information we are missing,
06:33
which would enable us to have
questions you could ask
06:35
like if you were using a route planner.
06:38
"If I am in Corfu in June 1323
06:41
and want to go to Constantinople,
06:44
where can I take a boat?"
06:47
Probably we can answer this question
06:49
with one or two or three days' precision.
06:51
"How much will it cost?"
06:55
"What are the chance of encountering pirates?"
06:57
Of course, you understand,
07:00
the central scientific challenge
of a project like this one
07:02
is qualifying, quantifying and representing
07:05
uncertainty and inconsistency
at each step of this process.
07:08
There are errors everywhere,
07:12
errors in the document, it's
the wrong name of the captain,
07:14
some of the boats never actually took to sea.
07:17
There are errors in translation, interpretative biases,
07:20
and on top of that, if you add algorithmic processes,
07:25
you're going to have errors in recognition,
07:28
errors in extraction,
07:31
so you have very, very uncertain data.
07:33
So how can we detect and
correct these inconsistencies?
07:38
How can we represent that form of uncertainty?
07:42
It's difficult. One thing you can do
07:45
is document each step of the process,
07:47
not only coding the historical information
07:50
but what we call the meta-historical information,
07:52
how is historical knowledge constructed,
07:55
documenting each step.
07:57
That will not guarantee that we actually converge
07:59
toward a single story of Venice,
08:01
but probably we can actually reconstruct
08:03
a fully documented potential story of Venice.
08:06
Maybe there's not a single map.
08:09
Maybe there are several maps.
08:10
The system should allow for that,
08:12
because we have to deal with
a new form of uncertainty,
08:14
which is really new for this type of giant databases.
08:17
And how should we communicate
08:22
this new research to a large audience?
08:24
Again, Venice is extraordinary for that.
08:28
With the millions of visitors that come every year,
08:31
it's actually one of the best places
08:33
to try to invent the museum of the future.
08:35
Imagine, horizontally you see the reconstructed map
08:38
of a given year,
08:41
and vertically, you see the document
08:42
that served the reconstruction,
08:45
paintings, for instance.
08:47
Imagine an immersive system that permits
08:50
to go and dive and reconstruct
the Venice of a given year,
08:53
some experience you could share within a group.
08:56
On the contrary, imagine actually that you start
08:59
from a document, a Venetian manuscript,
09:01
and you show, actually, what
you can construct out of it,
09:03
how it is decoded,
09:06
how the context of that document can be recreated.
09:08
This is an image from an exhibit
09:11
which is currently conducted in Geneva
09:13
with that type of system.
09:15
So to conclude, we can say that
09:17
research in the humanities is about to undergo
09:19
an evolution which is maybe similar
09:22
to what happened to life sciences 30 years ago.
09:24
It's really a question of scale.
09:29
We see projects which are
09:33
much beyond any single research team can do,
09:37
and this is really new for the humanities,
09:41
which very often take the habit of working
09:43
in small groups or only with a couple of researchers.
09:47
When you visit the Archivio di Stato,
09:51
you feel this is beyond what any single team can do,
09:53
and that should be a joint and common effort.
09:56
So what we must do for this paradigm shift
09:59
is actually foster a new generation
10:03
of "digital humanists"
10:04
that are going to be ready for this shift.
10:06
I thank you very much.
10:08
(Applause)
10:10

sponsored links

Frederic Kaplan - Digital humanities researcher
Frederic Kaplan seeks to digitize vast archives of historical information to make maps that move -- through time.

Why you should listen

Frederic Kaplan is the Digital Humanities Chair at Ecole Polytechnique Federale de Lausanne (EPFL) and the EPFL's Digital Humanities Lab Director. Kaplan leads the lab in applying computation to humanities research. His latest project is the Venice Time Machine, a collaborative work archiving 80 kilometers of books from throughout 1000 years of Venetician history. The goal of the time machine is to create an information system which can be searched and mapped. Think of it as a Google Maps for time.

Kaplan holds a PhD in artificial intelligence from the University Paris VI. He lives in Switzerland.

sponsored links

If you need translations, you can install "Google Translate" extension into your Chrome Browser.
Furthermore, you can change playback rate by installing "Video Speed Controller" extension.

Data provided by TED.

This website is owned and operated by Tokyo English Network.
The developer's blog is here.