ABOUT THE SPEAKER
Blaise Agüera y Arcas - Software architect
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces.

Why you should listen

Blaise Agüera y Arcas is principal scientist at Google, where he leads a team working on machine intelligence for mobile devices. His group works extensively with deep neural nets for machine perception and distributed learning, and it also investigates so-called "connectomics" research, assessing maps of connections within the brain.

Agüera y Arcas' background is as multidimensional as the visions he helps create. In the 1990s, he authored patents on both video compression and 3D visualization techniques, and in 2001, he made an influential computational discovery that cast doubt on Gutenberg's role as the father of movable type.

He also created Seadragon (acquired by Microsoft in 2006), the visualization technology that gives Photosynth its amazingly smooth digital rendering and zoom capabilities. Photosynth itself is a vastly powerful piece of software capable of taking a wide variety of images, analyzing them for similarities, and grafting them together into an interactive three-dimensional space. This seamless patchwork of images can be viewed via multiple angles and magnifications, allowing us to look around corners or “fly” in for a (much) closer look. Simply put, it could utterly transform the way we experience digital images.

He joined Microsoft when Seadragon was acquired by Live Labs in 2006. Shortly after the acquisition of Seadragon, Agüera y Arcas directed his team in a collaboration with Microsoft Research and the University of Washington, leading to the first public previews of Photosynth several months later. His TED Talk on Seadragon and Photosynth in 2007 is rated one of TED's "most jaw-dropping." He returned to TED in 2010 to demo Bing’s augmented reality maps.

Fun fact: According to the author, Agüera y Arcas is the inspiration for the character Elgin in the 2012 best-selling novel Where'd You Go, Bernadette?

More profile about the speaker
Blaise Agüera y Arcas | Speaker | TED.com
TED2007

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

Blaise Agueray Arcas演示Photosynth

Filmed:
5,831,957 views

Blaise Aguera y Arcas进行了炫目的Photosynth演示,这个软件可以改变我们观看数码图像的方式。使用从网上挑出的静态照片 , Photosynth创建了叹为观止的梦幻景象,并且可以穿梭其中。
- Software architect
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces. Full bio

Double-click the English transcript below to play the video.

00:25
What I'm going to show显示 you first, as quickly很快 as I can,
0
0
2000
首先,我要用最快的速度为大家演示
00:27
is some foundational基础 work, some new technology技术
1
2000
4000
一些新技术的基础研究成果。
00:31
that we brought to Microsoft微软 as part部分 of an acquisition获得
2
6000
3000
正好是一年前,微软收购了我们公司,
00:34
almost几乎 exactly究竟 a year ago. This is Seadragon海龙,
3
9000
3000
而我们为微软带来了这项技术,它就是Seadragon。
00:37
and it's an environment环境 in which哪一个 you can either locally本地 or remotely远程
4
12000
3000
Seadragon是一个软件环境,你可以通过它以近景或远景的方式
00:40
interact相互作用 with vast广大 amounts of visual视觉 data数据.
5
15000
3000
浏览浩瀚的可视化数据。
00:43
We're looking at many许多, many许多 gigabytes千兆字节 of digital数字 photos相片 here
6
18000
3000
我们这里看到的是许多许多GB(千兆字节)级别的数码照片,
00:46
and kind of seamlessly无缝 and continuously一直 zooming缩放 in,
7
21000
3000
对它们可以进行持续并且平滑的放大,
00:50
panning摇摄 through通过 the thing, rearranging重排 it in any way we want.
8
25000
2000
可以通过全景的方式浏览它们,还可以对它们进行重新排列。
00:52
And it doesn't matter how much information信息 we're looking at,
9
27000
4000
不管所见到的数据有多少、
00:56
how big these collections集合 are or how big the images图片 are.
10
31000
3000
图像集有多大以及图像本身有多大,Seadragon都拥有这样的处理能力。
00:59
Most of them are ordinary普通 digital数字 camera相机 photos相片,
11
34000
2000
以上展示的图片大部分都是由数码相机拍摄的照片,
01:01
but this one, for example, is a scan扫描 from the Library图书馆 of Congress国会,
12
36000
3000
但这个例子则不同,它是一张来自国会图书馆的扫描图片,
01:05
and it's in the 300 megapixel百万像素 range范围.
13
40000
2000
拥有3亿个像素。
01:08
It doesn't make any difference区别
14
43000
1000
然而,浏览它并没有什么区别,
01:09
because the only thing that ought应该 to limit限制 the performance性能
15
44000
3000
因为限制系统性能的唯一因素只是:
01:12
of a system系统 like this one is the number of pixels像素 on your screen屏幕
16
47000
3000
你所使用的屏幕的像素数。
01:15
at any given特定 moment时刻. It's also very flexible灵活 architecture建筑.
17
50000
3000
Seadragon同时也是一个非常灵活的架构。
01:18
This is an entire整个 book, so this is an example of non-image非图像 data数据.
18
53000
3000
举个例子,这是一本完整的书,它的数据是非图像的(文本)。
01:22
This is "Bleak苍凉 House" by Dickens狄更斯. Every一切 column is a chapter章节.
19
57000
5000
这是狄更斯所著的《荒凉山庄》,一列就是一章的内容。
01:27
To prove证明 to you that it's really text文本, and not an image图片,
20
62000
4000
我给大家证明一下这真的是文本而非图片,
01:31
we can do something like so, to really show显示
21
66000
2000
我们可以这样操作,
01:33
that this is a real真实 representation表示 of the text文本; it's not a picture图片.
22
68000
3000
大家可以看出这真的是文本,而不是一幅图片。
01:37
Maybe this is a kind of an artificial人造 way to read an e-book电子书.
23
72000
2000
也许这会是一种阅读电子书的方式,
01:39
I wouldn't不会 recommend推荐 it.
24
74000
1000
但是我可不推荐这么做。
01:40
This is a more realistic实际 case案件. This is an issue问题 of The Guardian监护人.
25
75000
3000
接下来是一个更加实际的例子,这是一期《卫报》。
01:43
Every一切 large image图片 is the beginning开始 of a section部分.
26
78000
2000
每一张大图片是一版开篇,
01:45
And this really gives you the joy喜悦 and the good experience经验
27
80000
3000
而报纸或者杂志的纸质版本本身就包含了多种比例的图片,
01:48
of reading the real真实 paper version of a magazine杂志 or a newspaper报纸,
28
83000
5000
在阅读的时候,读者会得到更好的阅读体验,
01:54
which哪一个 is an inherently本质 multi-scale多尺度 kind of medium.
29
89000
1000
从而享受阅读的乐趣。
01:56
We've我们已经 also doneDONE a little something
30
91000
1000
我们在这里做了小小的改动
01:57
with the corner of this particular特定 issue问题 of The Guardian监护人.
31
92000
3000
在这一期《卫报》得角上。
02:00
We've我们已经 made制作 up a fake ad广告 that's very high resolution解析度 --
32
95000
3000
我们虚构了一个高分辨率的广告图片——
02:03
much higher更高 than you'd be able能够 to get in an ordinary普通 ad广告 --
33
98000
2000
这比你平常看到的普通广告的分辨率要高很多,
02:05
and we've我们已经 embedded嵌入式 extra额外 content内容.
34
100000
2000
在图片中嵌入了额外的内容。
02:07
If you want to see the features特征 of this car汽车, you can see it here.
35
102000
2000
如果你希望看到这辆车的特性,你可以看这里。
02:10
Or other models楷模, or even technical技术 specifications规格.
36
105000
4000
你还能看到其他的型号,甚至技术规格。
02:15
And this really gets得到 at some of these ideas思路
37
110000
2000
这种方式在一定程度上
02:18
about really doing away with those limits范围 on screen屏幕 real真实 estate房地产.
38
113000
4000
避免了屏幕实际使用面积的限制。
02:22
We hope希望 that this means手段 no more pop-ups弹出窗口
39
117000
2000
我们希望这个技术能够减少不必要的弹出窗口
02:24
and other kind of rubbish垃圾 like that -- shouldn't不能 be necessary必要.
40
119000
2000
以及类似的垃圾信息。
02:27
Of course课程, mapping制图 is one of those really obvious明显 applications应用
41
122000
2000
显然,对于这项技术的应用,
02:29
for a technology技术 like this.
42
124000
2000
数字地图也是显而易见的应用之一。
02:31
And this one I really won't惯于 spend any time on,
43
126000
2000
对此,我真的不想花费太多的时间进行介绍,
02:33
except to say that we have things to contribute有助于 to this field领域 as well.
44
128000
2000
我只想告诉大家我们已经对这个领域做出了自己的贡献。
02:37
But those are all the roads道路 in the U.S.
45
132000
2000
这些只是在NASA的地理空间图片基础上
02:39
superimposed叠加 on top最佳 of a NASANASA geospatial地理空间 image图片.
46
134000
4000
进行叠加处理而得到的美国的道路地图。
02:44
So let's pull up, now, something else其他.
47
139000
2000
现在,我们先放下这些,看看其他的。
02:46
This is actually其实 live生活 on the Web卷筒纸 now; you can go check it out.
48
141000
3000
实际上,这项技术已经放到网上了,大家可以自己去体验一下。
02:49
This is a project项目 called PhotosynthPhotosynth的,
49
144000
1000
这个项目叫Photosynth,
02:51
which哪一个 really marries结婚 two different不同 technologies技术.
50
146000
1000
它实际上融合了两个不同的技术:
02:52
One of them is Seadragon海龙
51
147000
1000
一个是Seadragon,
02:54
and the other is some very beautiful美丽 computer电脑 vision视力 research研究
52
149000
2000
而另一个则是源自华盛顿大学的研究生Noah Snavely
02:57
doneDONE by Noah诺亚 SnavelySnavely, a graduate毕业 student学生 at the University大学 of Washington华盛顿,
53
152000
2000
所进行的计算机视觉研究的成果。
03:00
co-advised共同建议 by Steve史蒂夫 Seitz塞茨 at U.W.
54
155000
2000
这项研究还得到了华盛顿大学Steve Seitz
03:02
and Rick干草堆 SzeliskiSzeliski at Microsoft微软 Research研究. A very nice不错 collaboration合作.
55
157000
4000
和微软研究院Rick Szeliski的协助。这是一个非常漂亮的合作成果。
03:07
And so this is live生活 on the Web卷筒纸. It's powered动力 by Seadragon海龙.
56
162000
2000
这个项目在互联网上已经得到应用了,它是基于Seadragon技术构建的。
03:09
You can see that when we kind of do these sorts排序 of views意见,
57
164000
2000
你可以看到,我们轻松地对图片进行多种方式的查看,
03:12
where we can dive潜水 through通过 images图片
58
167000
1000
从而能够对图片进行细致的剖析
03:14
and have this kind of multi-resolution多分辨率 experience经验.
59
169000
1000
并且拥有多分辨率的浏览体验。
03:16
But the spatial空间的 arrangement安排 of the images图片 here is actually其实 meaningful富有意义的.
60
171000
4000
不过,这些图片在三维空间的排列事实上是非常有意义的。
03:20
The computer电脑 vision视力 algorithms算法 have registered注册 these images图片 together一起
61
175000
3000
计算机视觉算法将这些图片联系到一起,
03:23
so that they correspond对应 to the real真实 space空间 in which哪一个 these shots镜头 --
62
178000
4000
那么这些图片就能够将真实空间呈现出来了,
03:27
all taken采取 near Grassi格拉西 Lakes in the Canadian加拿大 Rockies落基山脉 --
63
182000
2000
而我们正是在这个空间里拍下了上述的照片——这些照片都是在
03:31
all these shots镜头 were taken采取. So you see elements分子 here
64
186000
2000
加拿大落基山脉的格拉西湖(Grassi Lakes)附近拍下的——(所有照片)都是在这里拍下的。
03:33
of stabilized稳定 slide-show幻灯片放映 or panoramic全景 imaging成像,
65
188000
4000
因此你可以看到这里的元素是稳定的幻灯放映或者全景成像,
03:40
and these things have all been related有关 spatially空间地.
66
195000
2000
而这些内容在空间上都是关联的。
03:42
I'm not sure if I have time to show显示 you any other environments环境.
67
197000
3000
我不确定我们是否有时间来展示更多的环境全景。
03:45
There are some that are much more spatial空间的.
68
200000
1000
有很多例子比这个的空间感还要强。
03:47
I would like to jump straight直行 to one of Noah's诺亚 original原版的 data-sets数据集 --
69
202000
3000
下面让我们来看一下去年夏天,
03:50
and this is from an early prototype原型 of PhotosynthPhotosynth的
70
205000
2000
我们利用Noah早期的数据库之一
03:52
that we first got working加工 in the summer夏季 --
71
207000
2000
所Photosynth的初期模型的建立。
03:54
to show显示 you what I think
72
209000
1000
我认为
03:55
is really the punch冲床 line线 behind背后 this technology技术,
73
210000
3000
这可谓是我们这项技术的最抢眼之处。
03:59
the PhotosynthPhotosynth的 technology技术. And it's not necessarily一定 so apparent明显的
74
214000
2000
这项技术不单单像我们在
04:01
from looking at the environments环境 that we've我们已经 put up on the website网站.
75
216000
3000
网站上展示得那么简单明了。
04:04
We had to worry担心 about the lawyers律师 and so on.
76
219000
2000
主要因为我们制作网站时,要顾及到很多法律问题。
04:07
This is a reconstruction重建 of Notre巴黎 Dame贵妇人 Cathedral大教堂
77
222000
1000
这里是利用Flickr网站上
04:09
that was doneDONE entirely完全 computationally计算
78
224000
2000
的图像重建的巴黎圣母院。
04:11
from images图片 scraped from FlickrFlickr的. You just type类型 Notre巴黎 Dame贵妇人 into FlickrFlickr的,
79
226000
3000
你所要做的只是在Flickr网站上输入“巴黎圣母院”
04:14
and you get some pictures图片 of guys in t-shirtsT恤, and of the campus校园
80
229000
3000
然后便能看到很多图片,包括留影的游人等等。
04:17
and so on. And each of these orange橙子 cones represents代表 an image图片
81
232000
4000
所有这些橘黄颜色的锥形都代表了一张
04:22
that was discovered发现 to belong属于 to this model模型.
82
237000
2000
用来建立模型的图片。
04:26
And so these are all FlickrFlickr的 images图片,
83
241000
2000
这些全部是来自Flickr的图片,
04:28
and they've他们已经 all been related有关 spatially空间地 in this way.
84
243000
3000
被这样在空间里被串联起来。
04:31
And we can just navigate导航 in this very simple简单 way.
85
246000
2000
接着,我们便可如此自如的进行浏览。
04:35
(Applause掌声)
86
250000
9000
(鼓掌)
04:44
You know, I never thought that I'd end结束 up working加工 at Microsoft微软.
87
259000
2000
说实话,我从来没想过我会最后来为微软工作
04:46
It's very gratifying可喜 to have this kind of reception招待会 here.
88
261000
4000
受到这样欢迎,真挺令人高兴的。
04:50
(Laughter笑声)
89
265000
3000
(笑声)
04:53
I guess猜测 you can see
90
268000
3000
我想你们可以看出
04:56
this is lots of different不同 types类型 of cameras相机:
91
271000
2000
这些图片原自很多不同的相机:
04:58
it's everything from cell细胞 phone电话 cameras相机 to professional专业的 SLRs单反相机,
92
273000
3000
从手机摄像头到专业单反。
05:02
quite相当 a large number of them, stitched缝合
93
277000
1000
如此大量的不同质量的照片,全被在这个环境下
05:03
together一起 in this environment环境.
94
278000
1000
拼合在了一起
05:04
And if I can, I'll find some of the sort分类 of weird奇怪的 ones那些.
95
279000
2000
让我来找些比较诡异的图片。
05:08
So many许多 of them are occluded闭塞 by faces面孔, and so on.
96
283000
3000
看,不少照片包含了游客的大头照等等。
05:13
Somewhere某处 in here there are actually其实
97
288000
1000
我记得这儿应该有
05:15
a series系列 of photographs照片 -- here we go.
98
290000
1000
一个系列的照片 - 啊,在这儿。
05:17
This is actually其实 a poster海报 of Notre巴黎 Dame贵妇人 that registered注册 correctly正确地.
99
292000
3000
这个是巴黎圣母院的海报。
05:21
We can dive潜水 in from the poster海报
100
296000
2000
我们可以钻到海报里
05:24
to a physical物理 view视图 of this environment环境.
101
299000
3000
去看整个重建的环境。
05:31
What the point here really is is that we can do things
102
306000
3000
这里的重点呢便是我们可以
05:34
with the social社会 environment环境. This is now taking服用 data数据 from everybody每个人 --
103
309000
5000
有效地利用网络社区。我们可以从每个人那里得到数据
05:39
from the entire整个 collective集体 memory记忆
104
314000
1000
将每个人对不同环境
05:40
of, visually视觉, of what the Earth地球 looks容貌 like --
105
315000
2000
的记忆收集在一起,
05:43
and link链接 all of that together一起.
106
318000
1000
共建成模型。
05:44
All of those photos相片 become成为 linked关联 together一起,
107
319000
2000
当所有这些图片交织在一起时,
05:46
and they make something emergent应急
108
321000
1000
所衍生出的
05:47
that's greater更大 than the sum of the parts部分.
109
322000
2000
要远远超过单单收集起全部。
05:49
You have a model模型 that emerges出现 of the entire整个 Earth地球.
110
324000
2000
这个模型所衍生出的,是整个地球。
05:51
Think of this as the long tail尾巴 to Stephen斯蒂芬 Lawler's劳勒的 Virtual虚拟 Earth地球 work.
111
326000
5000
这如同是Stephen Lawler的《虚拟地球》的长尾市场。(Stephen Lawler 微软Virtual Earth项目主管)(见Long tail 长尾市场 TED: Chris Anderson )
05:56
And this is something that grows成长 in complexity复杂
112
331000
2000
这类模型,会随着人们的
05:58
as people use it, and whose谁的 benefits好处 become成为 greater更大
113
333000
3000
使用而不断变的复杂,
06:01
to the users用户 as they use it.
114
336000
2000
变得更加有价值。
06:03
Their own拥有 photos相片 are getting得到 tagged标记 with meta-data元数据
115
338000
2000
用户的照片,会被大家
06:05
that somebody else其他 entered进入.
116
340000
1000
注上标签。
06:07
If somebody bothered困扰 to tag标签 all of these saints圣人
117
342000
3000
如果有人愿意为所有这些圣母院里的圣贤注上标签,
06:10
and say who they all are, then my photo照片 of Notre巴黎 Dame贵妇人 Cathedral大教堂
118
345000
3000
表明他们是谁,那我们的圣母院照片便会
06:13
suddenly突然 gets得到 enriched丰富 with all of that data数据,
119
348000
2000
一下子丰富起来,
06:15
and I can use it as an entry条目 point to dive潜水 into that space空间,
120
350000
3000
然后呢,我们便能以这张照片为起点,进入这个空间,
06:18
into that meta-verse元诗, using运用 everybody每个人 else's别人的 photos相片,
121
353000
2000
这个由很多人的照片所搭建的虚拟世界,
06:21
and do a kind of a cross-modal跨模态
122
356000
2000
从而得到一种跨越模型,
06:25
and cross-user交用户 social社会 experience经验 that way.
123
360000
3000
跨越用户的交互体验。
06:28
And of course课程, a by-product副产品 of all of that
124
363000
1000
当然了,这一切所带来另外一个宝贵产物便是
06:30
is immensely非常 rich丰富 virtual虚拟 models楷模
125
365000
2000
一个非常丰富的模型 - 充斥
06:32
of every一切 interesting有趣 part部分 of the Earth地球, collected
126
367000
2000
这地球每个角落里有趣的景观。这些景观不再
06:35
not just from overhead高架 flights航班 and from satellite卫星 images图片
127
370000
3000
局限于航空和卫星图片,
06:38
and so on, but from the collective集体 memory记忆.
128
373000
2000
而是实实在在的人们按下快门一刻所收藏的记忆的集合。
06:40
Thank you so much.
129
375000
2000
非常感谢!
06:42
(Applause掌声)
130
377000
11000
(掌声)
06:53
Chris克里斯 Anderson安德森: Do I understand理解 this right? That what your software软件 is going to allow允许,
131
388000
4000
Chris Anderson: 如果我理解正确的话,你们的这个软件将能够
06:58
is that at some point, really within the next下一个 few少数 years年份,
132
393000
2000
在未来的几年内
07:01
all the pictures图片 that are shared共享 by anyone任何人 across横过 the world世界
133
396000
4000
将来自全球的图片
07:05
are going to basically基本上 link链接 together一起?
134
400000
2000
接合在一起?
07:07
BAABAA: Yes. What this is really doing is discovering发现.
135
402000
2000
BAA:是的。这个软件的真正意义便是去探索。
07:09
It's creating创建 hyperlinks超链接, if you will, between之间 images图片.
136
404000
3000
它在图片间构建起超链接。
07:12
And it's doing that
137
407000
1000
这个接合的过程
07:13
based基于 on the content内容 inside the images图片.
138
408000
1000
完全是基于图片的内容。
07:14
And that gets得到 really exciting扣人心弦 when you think about the richness丰富
139
409000
3000
更令人兴奋的
07:17
of the semantic语义 information信息 that a lot of those images图片 have.
140
412000
2000
在于图片所包含的大量文字语义信息。
07:19
Like when you do a web卷筒纸 search搜索 for images图片,
141
414000
2000
比如,你在网上所以一张图片,
07:22
you type类型 in phrases短语, and the text文本 on the web卷筒纸 page
142
417000
2000
键入关键词后,网页上的文字内容
07:24
is carrying携带 a lot of information信息 about what that picture图片 is of.
143
419000
3000
将包含大量与这个图片相关的信息。
07:27
Now, what if that picture图片 links链接 to all of your pictures图片?
144
422000
2000
现在,假设这些图片全都与你的图片相连,那将会怎样?
07:29
Then the amount of semantic语义 interconnection互连
145
424000
2000
那时,所以这些语义信息的相互链接
07:31
and the amount of richness丰富 that comes out of that
146
426000
1000
以及内容量将是
07:32
is really huge巨大. It's a classic经典 network网络 effect影响.
147
427000
3000
巨大的。这将是非常典型的网络效应。
07:35
CACA: Blaise布莱斯, that is truly incredible难以置信. Congratulations祝贺.
148
430000
2000
CA:Blaise,太难以置信了。祝贺你们!
07:37
BAABAA: Thanks谢谢 so much.
149
432000
1000
BAA:非常感谢各位!
Translated by Geng Luo
Reviewed by dahong zhang

▲Back to top

ABOUT THE SPEAKER
Blaise Agüera y Arcas - Software architect
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces.

Why you should listen

Blaise Agüera y Arcas is principal scientist at Google, where he leads a team working on machine intelligence for mobile devices. His group works extensively with deep neural nets for machine perception and distributed learning, and it also investigates so-called "connectomics" research, assessing maps of connections within the brain.

Agüera y Arcas' background is as multidimensional as the visions he helps create. In the 1990s, he authored patents on both video compression and 3D visualization techniques, and in 2001, he made an influential computational discovery that cast doubt on Gutenberg's role as the father of movable type.

He also created Seadragon (acquired by Microsoft in 2006), the visualization technology that gives Photosynth its amazingly smooth digital rendering and zoom capabilities. Photosynth itself is a vastly powerful piece of software capable of taking a wide variety of images, analyzing them for similarities, and grafting them together into an interactive three-dimensional space. This seamless patchwork of images can be viewed via multiple angles and magnifications, allowing us to look around corners or “fly” in for a (much) closer look. Simply put, it could utterly transform the way we experience digital images.

He joined Microsoft when Seadragon was acquired by Live Labs in 2006. Shortly after the acquisition of Seadragon, Agüera y Arcas directed his team in a collaboration with Microsoft Research and the University of Washington, leading to the first public previews of Photosynth several months later. His TED Talk on Seadragon and Photosynth in 2007 is rated one of TED's "most jaw-dropping." He returned to TED in 2010 to demo Bing’s augmented reality maps.

Fun fact: According to the author, Agüera y Arcas is the inspiration for the character Elgin in the 2012 best-selling novel Where'd You Go, Bernadette?

More profile about the speaker
Blaise Agüera y Arcas | Speaker | TED.com