ABOUT THE SPEAKER
Blaise Agüera y Arcas - Software architect
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces.

Why you should listen

Blaise Agüera y Arcas is principal scientist at Google, where he leads a team working on machine intelligence for mobile devices. His group works extensively with deep neural nets for machine perception and distributed learning, and it also investigates so-called "connectomics" research, assessing maps of connections within the brain.

Agüera y Arcas' background is as multidimensional as the visions he helps create. In the 1990s, he authored patents on both video compression and 3D visualization techniques, and in 2001, he made an influential computational discovery that cast doubt on Gutenberg's role as the father of movable type.

He also created Seadragon (acquired by Microsoft in 2006), the visualization technology that gives Photosynth its amazingly smooth digital rendering and zoom capabilities. Photosynth itself is a vastly powerful piece of software capable of taking a wide variety of images, analyzing them for similarities, and grafting them together into an interactive three-dimensional space. This seamless patchwork of images can be viewed via multiple angles and magnifications, allowing us to look around corners or “fly” in for a (much) closer look. Simply put, it could utterly transform the way we experience digital images.

He joined Microsoft when Seadragon was acquired by Live Labs in 2006. Shortly after the acquisition of Seadragon, Agüera y Arcas directed his team in a collaboration with Microsoft Research and the University of Washington, leading to the first public previews of Photosynth several months later. His TED Talk on Seadragon and Photosynth in 2007 is rated one of TED's "most jaw-dropping." He returned to TED in 2010 to demo Bing’s augmented reality maps.

Fun fact: According to the author, Agüera y Arcas is the inspiration for the character Elgin in the 2012 best-selling novel Where'd You Go, Bernadette?

More profile about the speaker
Blaise Agüera y Arcas | Speaker | TED.com
TED2007

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Filmed:
5,831,957 views

Blaise Aguera y Arcas leads a dazzling demo of Photosynth, software that could transform the way we look at digital images. Using still photos culled from the Web, Photosynth builds breathtaking dreamscapes and lets us navigate them.
- Software architect
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces. Full bio

Double-click the English transcript below to play the video.

00:25
What I'm going to show you first, as quickly as I can,
0
0
2000
00:27
is some foundational work, some new technology
1
2000
4000
00:31
that we brought to Microsoft as part of an acquisition
2
6000
3000
00:34
almost exactly a year ago. This is Seadragon,
3
9000
3000
00:37
and it's an environment in which you can either locally or remotely
4
12000
3000
00:40
interact with vast amounts of visual data.
5
15000
3000
00:43
We're looking at many, many gigabytes of digital photos here
6
18000
3000
00:46
and kind of seamlessly and continuously zooming in,
7
21000
3000
00:50
panning through the thing, rearranging it in any way we want.
8
25000
2000
00:52
And it doesn't matter how much information we're looking at,
9
27000
4000
00:56
how big these collections are or how big the images are.
10
31000
3000
00:59
Most of them are ordinary digital camera photos,
11
34000
2000
01:01
but this one, for example, is a scan from the Library of Congress,
12
36000
3000
01:05
and it's in the 300 megapixel range.
13
40000
2000
01:08
It doesn't make any difference
14
43000
1000
01:09
because the only thing that ought to limit the performance
15
44000
3000
01:12
of a system like this one is the number of pixels on your screen
16
47000
3000
01:15
at any given moment. It's also very flexible architecture.
17
50000
3000
01:18
This is an entire book, so this is an example of non-image data.
18
53000
3000
01:22
This is "Bleak House" by Dickens. Every column is a chapter.
19
57000
5000
01:27
To prove to you that it's really text, and not an image,
20
62000
4000
01:31
we can do something like so, to really show
21
66000
2000
01:33
that this is a real representation of the text; it's not a picture.
22
68000
3000
01:37
Maybe this is a kind of an artificial way to read an e-book.
23
72000
2000
01:39
I wouldn't recommend it.
24
74000
1000
01:40
This is a more realistic case. This is an issue of The Guardian.
25
75000
3000
01:43
Every large image is the beginning of a section.
26
78000
2000
01:45
And this really gives you the joy and the good experience
27
80000
3000
01:48
of reading the real paper version of a magazine or a newspaper,
28
83000
5000
01:54
which is an inherently multi-scale kind of medium.
29
89000
1000
01:56
We've also done a little something
30
91000
1000
01:57
with the corner of this particular issue of The Guardian.
31
92000
3000
02:00
We've made up a fake ad that's very high resolution --
32
95000
3000
02:03
much higher than you'd be able to get in an ordinary ad --
33
98000
2000
02:05
and we've embedded extra content.
34
100000
2000
02:07
If you want to see the features of this car, you can see it here.
35
102000
2000
02:10
Or other models, or even technical specifications.
36
105000
4000
02:15
And this really gets at some of these ideas
37
110000
2000
02:18
about really doing away with those limits on screen real estate.
38
113000
4000
02:22
We hope that this means no more pop-ups
39
117000
2000
02:24
and other kind of rubbish like that -- shouldn't be necessary.
40
119000
2000
02:27
Of course, mapping is one of those really obvious applications
41
122000
2000
02:29
for a technology like this.
42
124000
2000
02:31
And this one I really won't spend any time on,
43
126000
2000
02:33
except to say that we have things to contribute to this field as well.
44
128000
2000
02:37
But those are all the roads in the U.S.
45
132000
2000
02:39
superimposed on top of a NASA geospatial image.
46
134000
4000
02:44
So let's pull up, now, something else.
47
139000
2000
02:46
This is actually live on the Web now; you can go check it out.
48
141000
3000
02:49
This is a project called Photosynth,
49
144000
1000
02:51
which really marries two different technologies.
50
146000
1000
02:52
One of them is Seadragon
51
147000
1000
02:54
and the other is some very beautiful computer vision research
52
149000
2000
02:57
done by Noah Snavely, a graduate student at the University of Washington,
53
152000
2000
03:00
co-advised by Steve Seitz at U.W.
54
155000
2000
03:02
and Rick Szeliski at Microsoft Research. A very nice collaboration.
55
157000
4000
03:07
And so this is live on the Web. It's powered by Seadragon.
56
162000
2000
03:09
You can see that when we kind of do these sorts of views,
57
164000
2000
03:12
where we can dive through images
58
167000
1000
03:14
and have this kind of multi-resolution experience.
59
169000
1000
03:16
But the spatial arrangement of the images here is actually meaningful.
60
171000
4000
03:20
The computer vision algorithms have registered these images together
61
175000
3000
03:23
so that they correspond to the real space in which these shots --
62
178000
4000
03:27
all taken near Grassi Lakes in the Canadian Rockies --
63
182000
2000
03:31
all these shots were taken. So you see elements here
64
186000
2000
03:33
of stabilized slide-show or panoramic imaging,
65
188000
4000
03:40
and these things have all been related spatially.
66
195000
2000
03:42
I'm not sure if I have time to show you any other environments.
67
197000
3000
03:45
There are some that are much more spatial.
68
200000
1000
03:47
I would like to jump straight to one of Noah's original data-sets --
69
202000
3000
03:50
and this is from an early prototype of Photosynth
70
205000
2000
03:52
that we first got working in the summer --
71
207000
2000
03:54
to show you what I think
72
209000
1000
03:55
is really the punch line behind this technology,
73
210000
3000
03:59
the Photosynth technology. And it's not necessarily so apparent
74
214000
2000
04:01
from looking at the environments that we've put up on the website.
75
216000
3000
04:04
We had to worry about the lawyers and so on.
76
219000
2000
04:07
This is a reconstruction of Notre Dame Cathedral
77
222000
1000
04:09
that was done entirely computationally
78
224000
2000
04:11
from images scraped from Flickr. You just type Notre Dame into Flickr,
79
226000
3000
04:14
and you get some pictures of guys in t-shirts, and of the campus
80
229000
3000
04:17
and so on. And each of these orange cones represents an image
81
232000
4000
04:22
that was discovered to belong to this model.
82
237000
2000
04:26
And so these are all Flickr images,
83
241000
2000
04:28
and they've all been related spatially in this way.
84
243000
3000
04:31
And we can just navigate in this very simple way.
85
246000
2000
04:35
(Applause)
86
250000
9000
04:44
You know, I never thought that I'd end up working at Microsoft.
87
259000
2000
04:46
It's very gratifying to have this kind of reception here.
88
261000
4000
04:50
(Laughter)
89
265000
3000
04:53
I guess you can see
90
268000
3000
04:56
this is lots of different types of cameras:
91
271000
2000
04:58
it's everything from cell phone cameras to professional SLRs,
92
273000
3000
05:02
quite a large number of them, stitched
93
277000
1000
05:03
together in this environment.
94
278000
1000
05:04
And if I can, I'll find some of the sort of weird ones.
95
279000
2000
05:08
So many of them are occluded by faces, and so on.
96
283000
3000
05:13
Somewhere in here there are actually
97
288000
1000
05:15
a series of photographs -- here we go.
98
290000
1000
05:17
This is actually a poster of Notre Dame that registered correctly.
99
292000
3000
05:21
We can dive in from the poster
100
296000
2000
05:24
to a physical view of this environment.
101
299000
3000
05:31
What the point here really is is that we can do things
102
306000
3000
05:34
with the social environment. This is now taking data from everybody --
103
309000
5000
05:39
from the entire collective memory
104
314000
1000
05:40
of, visually, of what the Earth looks like --
105
315000
2000
05:43
and link all of that together.
106
318000
1000
05:44
All of those photos become linked together,
107
319000
2000
05:46
and they make something emergent
108
321000
1000
05:47
that's greater than the sum of the parts.
109
322000
2000
05:49
You have a model that emerges of the entire Earth.
110
324000
2000
05:51
Think of this as the long tail to Stephen Lawler's Virtual Earth work.
111
326000
5000
05:56
And this is something that grows in complexity
112
331000
2000
05:58
as people use it, and whose benefits become greater
113
333000
3000
06:01
to the users as they use it.
114
336000
2000
06:03
Their own photos are getting tagged with meta-data
115
338000
2000
06:05
that somebody else entered.
116
340000
1000
06:07
If somebody bothered to tag all of these saints
117
342000
3000
06:10
and say who they all are, then my photo of Notre Dame Cathedral
118
345000
3000
06:13
suddenly gets enriched with all of that data,
119
348000
2000
06:15
and I can use it as an entry point to dive into that space,
120
350000
3000
06:18
into that meta-verse, using everybody else's photos,
121
353000
2000
06:21
and do a kind of a cross-modal
122
356000
2000
06:25
and cross-user social experience that way.
123
360000
3000
06:28
And of course, a by-product of all of that
124
363000
1000
06:30
is immensely rich virtual models
125
365000
2000
06:32
of every interesting part of the Earth, collected
126
367000
2000
06:35
not just from overhead flights and from satellite images
127
370000
3000
06:38
and so on, but from the collective memory.
128
373000
2000
06:40
Thank you so much.
129
375000
2000
06:42
(Applause)
130
377000
11000
06:53
Chris Anderson: Do I understand this right? That what your software is going to allow,
131
388000
4000
06:58
is that at some point, really within the next few years,
132
393000
2000
07:01
all the pictures that are shared by anyone across the world
133
396000
4000
07:05
are going to basically link together?
134
400000
2000
07:07
BAA: Yes. What this is really doing is discovering.
135
402000
2000
07:09
It's creating hyperlinks, if you will, between images.
136
404000
3000
07:12
And it's doing that
137
407000
1000
07:13
based on the content inside the images.
138
408000
1000
07:14
And that gets really exciting when you think about the richness
139
409000
3000
07:17
of the semantic information that a lot of those images have.
140
412000
2000
07:19
Like when you do a web search for images,
141
414000
2000
07:22
you type in phrases, and the text on the web page
142
417000
2000
07:24
is carrying a lot of information about what that picture is of.
143
419000
3000
07:27
Now, what if that picture links to all of your pictures?
144
422000
2000
07:29
Then the amount of semantic interconnection
145
424000
2000
07:31
and the amount of richness that comes out of that
146
426000
1000
07:32
is really huge. It's a classic network effect.
147
427000
3000
07:35
CA: Blaise, that is truly incredible. Congratulations.
148
430000
2000
07:37
BAA: Thanks so much.
149
432000
1000

▲Back to top

ABOUT THE SPEAKER
Blaise Agüera y Arcas - Software architect
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces.

Why you should listen

Blaise Agüera y Arcas is principal scientist at Google, where he leads a team working on machine intelligence for mobile devices. His group works extensively with deep neural nets for machine perception and distributed learning, and it also investigates so-called "connectomics" research, assessing maps of connections within the brain.

Agüera y Arcas' background is as multidimensional as the visions he helps create. In the 1990s, he authored patents on both video compression and 3D visualization techniques, and in 2001, he made an influential computational discovery that cast doubt on Gutenberg's role as the father of movable type.

He also created Seadragon (acquired by Microsoft in 2006), the visualization technology that gives Photosynth its amazingly smooth digital rendering and zoom capabilities. Photosynth itself is a vastly powerful piece of software capable of taking a wide variety of images, analyzing them for similarities, and grafting them together into an interactive three-dimensional space. This seamless patchwork of images can be viewed via multiple angles and magnifications, allowing us to look around corners or “fly” in for a (much) closer look. Simply put, it could utterly transform the way we experience digital images.

He joined Microsoft when Seadragon was acquired by Live Labs in 2006. Shortly after the acquisition of Seadragon, Agüera y Arcas directed his team in a collaboration with Microsoft Research and the University of Washington, leading to the first public previews of Photosynth several months later. His TED Talk on Seadragon and Photosynth in 2007 is rated one of TED's "most jaw-dropping." He returned to TED in 2010 to demo Bing’s augmented reality maps.

Fun fact: According to the author, Agüera y Arcas is the inspiration for the character Elgin in the 2012 best-selling novel Where'd You Go, Bernadette?

More profile about the speaker
Blaise Agüera y Arcas | Speaker | TED.com

Data provided by TED.

This site was created in May 2015 and the last update was on January 12, 2020. It will no longer be updated.

We are currently creating a new site called "eng.lish.video" and would be grateful if you could access it.

If you have any questions or suggestions, please feel free to write comments in your language on the contact form.

Privacy Policy

Developer's Blog

Buy Me A Coffee