ABOUT THE SPEAKER
Frederic Kaplan - Digital humanities researcher
Frederic Kaplan seeks to digitize vast archives of historical information to make maps that move -- through time.

Why you should listen

Frederic Kaplan is the Digital Humanities Chair at Ecole Polytechnique Federale de Lausanne (EPFL) and the EPFL's Digital Humanities Lab Director. Kaplan leads the lab in applying computation to humanities research. His latest project is the Venice Time Machine, a collaborative work archiving 80 kilometers of books from throughout 1000 years of Venetician history. The goal of the time machine is to create an information system which can be searched and mapped. Think of it as a Google Maps for time.

Kaplan holds a PhD in artificial intelligence from the University Paris VI. He lives in Switzerland.

More profile about the speaker
Frederic Kaplan | Speaker | TED.com
TEDxCaFoscariU

Frederic Kaplan: How to build an information time machine

Filmed:
1,238,053 views

Imagine if you could surf Facebook ... from the Middle Ages. Well, it may not be as far off as it sounds. In a fun and interesting talk, Frederic Kaplan shows off the Venice Time Machine, a project to digitize 80 kilometers of books to create a historical and geographical simulation of Venice across 1,000 years.
- Digital humanities researcher
Frederic Kaplan seeks to digitize vast archives of historical information to make maps that move -- through time. Full bio

Double-click the English transcript below to play the video.

00:12
This is an image of the planet Earth.
0
285
2893
00:15
It looks very much like the Apollo pictures
1
3178
3093
00:18
that are very well known.
2
6271
1611
00:19
There is something different;
3
7882
2070
00:21
you can click on it,
4
9952
1447
00:23
and if you click on it,
5
11399
1198
00:24
you can zoom in on almost any place on the Earth.
6
12597
3072
00:27
For instance, this is a bird's-eye view
7
15669
1999
00:29
of the EPFL campus.
8
17668
2666
00:32
In many cases, you can also see
9
20334
2108
00:34
how a building looks from a nearby street.
10
22442
3740
00:38
This is pretty amazing.
11
26182
1422
00:39
But there's something missing in this wonderful tour:
12
27604
3427
00:43
It's time.
13
31031
2188
00:45
i'm not really sure when this picture was taken.
14
33219
3070
00:48
I'm not even sure it was taken
15
36289
1412
00:49
at the same moment as the bird's-eye view.
16
37701
6083
00:55
In my lab, we develop tools
17
43784
2209
00:57
to travel not only in space
18
45993
1764
00:59
but also through time.
19
47757
2558
01:02
The kind of question we're asking is
20
50315
1870
01:04
Is it possible to build something
21
52185
1393
01:05
like Google Maps of the past?
22
53578
2178
01:07
Can I add a slider on top of Google Maps
23
55756
3310
01:11
and just change the year,
24
59066
1803
01:12
seeing how it was 100 years before,
25
60869
1791
01:14
1,000 years before?
26
62660
1669
01:16
Is that possible?
27
64329
2123
01:18
Can I reconstruct social networks of the past?
28
66452
2252
01:20
Can I make a Facebook of the Middle Ages?
29
68704
3049
01:23
So, can I build time machines?
30
71753
3776
01:27
Maybe we can just say, "No, it's not possible."
31
75529
2565
01:30
Or, maybe, we can think of it from an information point of view.
32
78094
3810
01:33
This is what I call the information mushroom.
33
81904
3190
01:37
Vertically, you have the time.
34
85094
1583
01:38
and horizontally, the amount of digital information available.
35
86677
2740
01:41
Obviously, in the last 10 years, we have much information.
36
89417
3482
01:44
And obviously the more we go in the past, the less information we have.
37
92899
3548
01:48
If we want to build something like Google Maps of the past,
38
96447
2318
01:50
or Facebook of the past,
39
98765
1494
01:52
we need to enlarge this space,
40
100259
1574
01:53
we need to make that like a rectangle.
41
101833
1938
01:55
How do we do that?
42
103771
1510
01:57
One way is digitization.
43
105281
2098
01:59
There's a lot of material available --
44
107395
1779
02:01
newspaper, printed books, thousands of printed books.
45
109190
6270
02:07
I can digitize all these.
46
115460
1768
02:09
I can extract information from these.
47
117228
2737
02:11
Of course, the more you go in the past,
the less information you will have.
48
119965
3543
02:15
So, it might not be enough.
49
123508
2646
02:18
So, I can do what historians do.
50
126154
2408
02:20
I can extrapolate.
51
128562
1524
02:22
This is what we call, in computer science, simulation.
52
130086
4470
02:26
If I take a log book,
53
134556
1751
02:28
I can consider, it's not just a log book
54
136307
2404
02:30
of a Venetian captain going to a particular journey.
55
138711
2972
02:33
I can consider it is actually a log book
56
141683
1643
02:35
which is representative of
many journeys of that period.
57
143326
2582
02:37
I'm extrapolating.
58
145908
2245
02:40
If I have a painting of a facade,
59
148153
2038
02:42
I can consider it's not just that particular building,
60
150191
2751
02:44
but probably it also shares the same grammar
61
152942
3932
02:48
of buildings where we lost any information.
62
156874
4041
02:52
So if we want to construct a time machine,
63
160915
2858
02:55
we need two things.
64
163773
1339
02:57
We need very large archives,
65
165112
2234
02:59
and we need excellent specialists.
66
167346
2742
03:02
The Venice Time Machine,
67
170088
1874
03:03
the project I'm going to talk to you about,
68
171962
1805
03:05
is a joint project between the EPFL
69
173767
3020
03:08
and the University of Venice Ca'Foscari.
70
176787
2978
03:11
There's something very peculiar about Venice,
71
179765
2165
03:13
that its administration has been
72
181930
2674
03:16
very, very bureaucratic.
73
184604
2194
03:18
They've been keeping track of everything,
74
186798
2193
03:20
almost like Google today.
75
188991
2915
03:23
At the Archivio di Stato,
76
191906
1514
03:25
you have 80 kilometers of archives
77
193420
1764
03:27
documenting every aspect
78
195184
2009
03:29
of the life of Venice over
more than 1,000 years.
79
197193
2246
03:31
You have every boat that goes out,
80
199439
1920
03:33
every boat that comes in.
81
201359
1076
03:34
You have every change that was made in the city.
82
202435
2797
03:37
This is all there.
83
205232
3291
03:40
We are setting up a 10-year digitization program
84
208523
3908
03:44
which has the objective of transforming
85
212431
1677
03:46
this immense archive
86
214108
1384
03:47
into a giant information system.
87
215492
2426
03:49
The type of objective we want to reach
88
217918
1857
03:51
is 450 books a day that can be digitized.
89
219775
4726
03:56
Of course, when you digitize, that's not enough,
90
224501
2247
03:58
because these documents,
91
226748
1287
04:00
most of them are in Latin, in Tuscan,
92
228035
2639
04:02
in Venetian dialect,
93
230689
1515
04:04
so you need to transcribe them,
94
232204
1675
04:05
to translate them in some cases,
95
233879
1681
04:07
to index them,
96
235560
1120
04:08
and this is obviously not easy.
97
236680
2164
04:10
In particular, traditional optical
character recognition method
98
238844
3844
04:14
that can be used for printed manuscripts,
99
242688
1424
04:16
they do not work well on the handwritten document.
100
244112
4004
04:20
So the solution is actually to take inspiration
101
248116
2130
04:22
from another domain: speech recognition.
102
250246
2901
04:25
This is a domain of something
that seems impossible,
103
253147
2055
04:27
which can actually be done,
104
255202
2537
04:29
simply by putting additional constraints.
105
257739
2194
04:31
If you have a very good model
106
259933
1586
04:33
of a language which is used,
107
261519
1526
04:35
if you have a very good model of a document,
108
263045
2086
04:37
how well they are structured.
109
265131
1432
04:38
And these are administrative documents.
110
266563
1353
04:39
They are well structured in many cases.
111
267931
2132
04:42
If you divide this huge archive into smaller subsets
112
270063
3308
04:45
where a smaller subset
actually shares similar features,
113
273371
2877
04:48
then there's a chance of success.
114
276248
4031
04:54
If we reach that stage, then there's something else:
115
282761
2435
04:57
we can extract from this document events.
116
285196
3522
05:00
Actually probably 10 billion events
117
288718
2298
05:03
can be extracted from this archive.
118
291016
1931
05:04
And this giant information system
119
292947
1724
05:06
can be searched in many ways.
120
294671
1816
05:08
You can ask questions like,
121
296487
1368
05:09
"Who lived in this palazzo in 1323?"
122
297855
2760
05:12
"How much cost a sea bream at the Realto market
123
300615
2222
05:14
in 1434?"
124
302837
1724
05:16
"What was the salary
125
304561
1460
05:18
of a glass maker in Murano
126
306021
2045
05:20
maybe over a decade?"
127
308066
1406
05:21
You can ask even bigger questions
128
309472
1422
05:22
because it will be semantically coded.
129
310894
2738
05:25
And then what you can do is put that in space,
130
313632
2140
05:27
because much of this information is spatial.
131
315772
2173
05:29
And from that, you can do things like
132
317945
1935
05:31
reconstructing this extraordinary journey
133
319880
2113
05:33
of that city that managed to
have a sustainable development
134
321993
3356
05:37
over a thousand years,
135
325349
2126
05:39
managing to have all the time
136
327475
1620
05:41
a form of equilibrium with its environment.
137
329095
2861
05:43
You can reconstruct that journey,
138
331956
1248
05:45
visualize it in many different ways.
139
333204
2896
05:48
But of course, you cannot understand
Venice if you just look at the city.
140
336100
2699
05:50
You have to put it in a larger European context.
141
338799
2396
05:53
So the idea is also to document all the things
142
341195
2821
05:56
that worked at the European level.
143
344016
2423
05:58
We can reconstruct also the journey
144
346439
1964
06:00
of the Venetian maritime empire,
145
348403
1990
06:02
how it progressively controlled the Adriatic Sea,
146
350393
3166
06:05
how it became the most powerful medieval empire
147
353559
3746
06:09
of its time,
148
357305
1561
06:10
controlling most of the sea routes
149
358866
2172
06:13
from the east to the south.
150
361038
2933
06:17
But you can even do other things,
151
365305
2316
06:19
because in these maritime routes,
152
367621
2277
06:21
there are regular patterns.
153
369898
1975
06:23
You can go one step beyond
154
371889
2493
06:26
and actually create a simulation system,
155
374382
2120
06:28
create a Mediterranean simulator
156
376502
2815
06:31
which is capable actually of reconstructing
157
379317
2593
06:33
even the information we are missing,
158
381910
2202
06:36
which would enable us to have
questions you could ask
159
384112
2988
06:39
like if you were using a route planner.
160
387100
2988
06:42
"If I am in Corfu in June 1323
161
390088
3071
06:45
and want to go to Constantinople,
162
393159
2526
06:47
where can I take a boat?"
163
395685
2143
06:49
Probably we can answer this question
164
397828
1367
06:51
with one or two or three days' precision.
165
399195
4473
06:55
"How much will it cost?"
166
403668
1607
06:57
"What are the chance of encountering pirates?"
167
405275
3592
07:00
Of course, you understand,
168
408867
1811
07:02
the central scientific challenge
of a project like this one
169
410678
2609
07:05
is qualifying, quantifying and representing
170
413287
3729
07:09
uncertainty and inconsistency
at each step of this process.
171
417016
3330
07:12
There are errors everywhere,
172
420346
2712
07:15
errors in the document, it's
the wrong name of the captain,
173
423058
2489
07:17
some of the boats never actually took to sea.
174
425547
3213
07:20
There are errors in translation, interpretative biases,
175
428760
4857
07:25
and on top of that, if you add algorithmic processes,
176
433624
3466
07:29
you're going to have errors in recognition,
177
437090
2949
07:32
errors in extraction,
178
440039
1961
07:34
so you have very, very uncertain data.
179
442000
4481
07:38
So how can we detect and
correct these inconsistencies?
180
446481
3757
07:42
How can we represent that form of uncertainty?
181
450238
3660
07:45
It's difficult. One thing you can do
182
453898
2097
07:47
is document each step of the process,
183
455995
2226
07:50
not only coding the historical information
184
458221
2448
07:52
but what we call the meta-historical information,
185
460669
2679
07:55
how is historical knowledge constructed,
186
463348
2663
07:58
documenting each step.
187
466011
1998
08:00
That will not guarantee that we actually converge
188
468009
1645
08:01
toward a single story of Venice,
189
469654
2450
08:04
but probably we can actually reconstruct
190
472104
2138
08:06
a fully documented potential story of Venice.
191
474242
3048
08:09
Maybe there's not a single map.
192
477290
1459
08:10
Maybe there are several maps.
193
478749
2120
08:12
The system should allow for that,
194
480869
2216
08:15
because we have to deal with
a new form of uncertainty,
195
483085
2859
08:17
which is really new for this type of giant databases.
196
485944
4641
08:22
And how should we communicate
197
490585
2190
08:24
this new research to a large audience?
198
492790
3979
08:28
Again, Venice is extraordinary for that.
199
496769
2663
08:31
With the millions of visitors that come every year,
200
499432
2171
08:33
it's actually one of the best places
201
501603
1763
08:35
to try to invent the museum of the future.
202
503366
2988
08:38
Imagine, horizontally you see the reconstructed map
203
506354
3304
08:41
of a given year,
204
509658
1286
08:42
and vertically, you see the document
205
510944
2958
08:45
that served the reconstruction,
206
513902
1511
08:47
paintings, for instance.
207
515413
3400
08:50
Imagine an immersive system that permits
208
518813
2580
08:53
to go and dive and reconstruct
the Venice of a given year,
209
521393
3502
08:56
some experience you could share within a group.
210
524895
2715
08:59
On the contrary, imagine actually that you start
211
527610
2246
09:01
from a document, a Venetian manuscript,
212
529856
2207
09:04
and you show, actually, what
you can construct out of it,
213
532063
3049
09:07
how it is decoded,
214
535112
1772
09:08
how the context of that document can be recreated.
215
536884
2415
09:11
This is an image from an exhibit
216
539299
1885
09:13
which is currently conducted in Geneva
217
541184
2276
09:15
with that type of system.
218
543460
2354
09:17
So to conclude, we can say that
219
545814
2175
09:19
research in the humanities is about to undergo
220
547989
3079
09:23
an evolution which is maybe similar
221
551068
1802
09:24
to what happened to life sciences 30 years ago.
222
552870
4582
09:29
It's really a question of scale.
223
557452
4676
09:34
We see projects which are
224
562130
3303
09:37
much beyond any single research team can do,
225
565433
3843
09:41
and this is really new for the humanities,
226
569276
2243
09:43
which very often take the habit of working
227
571519
3869
09:47
in small groups or only with a couple of researchers.
228
575388
4008
09:51
When you visit the Archivio di Stato,
229
579396
2118
09:53
you feel this is beyond what any single team can do,
230
581514
2822
09:56
and that should be a joint and common effort.
231
584336
3834
10:00
So what we must do for this paradigm shift
232
588170
3106
10:03
is actually foster a new generation
233
591276
1902
10:05
of "digital humanists"
234
593178
1537
10:06
that are going to be ready for this shift.
235
594715
2090
10:08
I thank you very much.
236
596805
1959
10:10
(Applause)
237
598764
4000

▲Back to top

ABOUT THE SPEAKER
Frederic Kaplan - Digital humanities researcher
Frederic Kaplan seeks to digitize vast archives of historical information to make maps that move -- through time.

Why you should listen

Frederic Kaplan is the Digital Humanities Chair at Ecole Polytechnique Federale de Lausanne (EPFL) and the EPFL's Digital Humanities Lab Director. Kaplan leads the lab in applying computation to humanities research. His latest project is the Venice Time Machine, a collaborative work archiving 80 kilometers of books from throughout 1000 years of Venetician history. The goal of the time machine is to create an information system which can be searched and mapped. Think of it as a Google Maps for time.

Kaplan holds a PhD in artificial intelligence from the University Paris VI. He lives in Switzerland.

More profile about the speaker
Frederic Kaplan | Speaker | TED.com