ABOUT THE SPEAKER
Abe Davis - Computer scientist
Computer vision expert Abe Davis pioneers methods to extract audio from silent digital videos, even footage shot on ordinary consumer cameras.

Why you should listen

MIT PhD student, computer vision wizard and rap artist Abe Davis has co-created the world’s most improbable audio instrument.  In 2014, Davis and his collaborators debuted the “visual microphone,” an algorithm that samples the sympathetic vibrations of ordinary objects (such as a potato chip bag) from ordinary high-speed video footage and transduces them into intelligible audio tracks.

Davis is also the author of Caperture, a 3D-imaging app designed to create and share 3D images on any compatible smartphone.

More profile about the speaker
Abe Davis | Speaker | TED.com
TED2015

Abe Davis: New video technology that reveals an object's hidden properties

阿比·戴维斯: 揭示物体隐藏属性的视频新技术

Filmed:
1,482,525 views

细微的运动在我们周围无时无刻不在发生,包括由声音引起的细微振动。新技术能让我们从一段看似静止的视频中提取这些振动,并将声音还原。但阿比·戴维斯更进了一步:请看他如何利用软件,通过一段简单的视频,揭示物体的隐藏属性,并创造出一种与物体互动的新方式。
- Computer scientist
Computer vision expert Abe Davis pioneers methods to extract audio from silent digital videos, even footage shot on ordinary consumer cameras. Full bio

Double-click the English transcript below to play the video.

00:13
Most of us think of motion运动
as a very visual视觉 thing.
0
1373
3349
大部分人认为
动作是明显可见的。
00:17
If I walk步行 across横过 this stage阶段
or gesture手势 with my hands while I speak说话,
1
5889
5088
比如我走过这个舞台,
或者边做手势边说话,
00:22
that motion运动 is something that you can see.
2
10977
2261
这些动作都能被大家看到。
00:26
But there's a world世界 of important重要 motion运动
that's too subtle微妙 for the human人的 eye,
3
14255
5482
但还有很多重要的动作
肉眼很难察觉到,
00:31
and over the past过去 few少数 years年份,
4
19737
2041
在过去几年中,
00:33
we've我们已经 started开始 to find that cameras相机
5
21778
1997
我们致力于寻找某种摄像机
00:35
can often经常 see this motion运动
even when humans人类 can't.
6
23775
3410
可以捕捉到人眼看不到的运动。
00:40
So let me show显示 you what I mean.
7
28305
1551
请看大屏幕。
00:42
On the left here, you see video视频
of a person's人的 wrist,
8
30717
3622
左边是一个人的手腕,
00:46
and on the right, you see video视频
of a sleeping睡眠 infant婴儿,
9
34339
3147
右边是一个熟睡的婴儿,
00:49
but if I didn't tell you
that these were videos视频,
10
37486
3146
但是如果我不告诉你们这是一段视频,
你们可能会认为
这只是两张普通的图片,
00:52
you might威力 assume承担 that you were looking
at two regular定期 images图片,
11
40632
3761
因为乍一看,
00:56
because in both cases,
12
44393
1672
这两段视频几乎是完全静止的。
00:58
these videos视频 appear出现 to be
almost几乎 completely全然 still.
13
46065
3047
01:02
But there's actually其实 a lot
of subtle微妙 motion运动 going on here,
14
50175
3885
但实际上,画面中
有许多细微的运动变化,
01:06
and if you were to touch触摸
the wrist on the left,
15
54060
2392
如果你能碰到左边的那个手腕,
01:08
you would feel a pulse脉冲,
16
56452
1996
你会感受到脉搏的跳动,
01:10
and if you were to hold保持
the infant婴儿 on the right,
17
58448
2485
如果你抱起右边的婴儿,
01:12
you would feel the rise上升
and fall秋季 of her chest胸部
18
60933
2391
你能感受到她胸腔的起伏,
01:15
as she took each breath呼吸.
19
63324
1390
感受到她的每一次呼吸。
01:17
And these motions运动 carry携带
a lot of significance意义,
20
65762
3576
这些动作都很重要,
01:21
but they're usually平时
too subtle微妙 for us to see,
21
69338
3343
但由于过于细微,
很难被我们察觉,
01:24
so instead代替, we have to observe them
22
72681
2276
要想感受到这些动作的存在
01:26
through通过 direct直接 contact联系, through通过 touch触摸.
23
74957
2900
只能通过直接接触。
01:30
But a few少数 years年份 ago,
24
78997
1265
然而几年前,
01:32
my colleagues同事 at MITMIT developed发达
what they call a motion运动 microscope显微镜,
25
80262
4405
我在麻省理工学院的同事们
开发出了一种被称为“动作显微镜”的软件,
01:36
which哪一个 is software软件 that finds认定
these subtle微妙 motions运动 in video视频
26
84667
4384
能够发现视频中细微的运动,
01:41
and amplifies放大 them so that they
become成为 large enough足够 for us to see.
27
89051
3562
并将其放大到肉眼可见的级别。
01:45
And so, if we use their software软件
on the left video视频,
28
93416
3483
如果我们运用这一软件分析左边的视频,
01:48
it lets让我们 us see the pulse脉冲 in this wrist,
29
96899
3250
我们就能看到手腕上的脉搏跳动,
01:52
and if we were to count计数 that pulse脉冲,
30
100149
1695
通过计算脉搏数量,
01:53
we could even figure数字 out
this person's人的 heart rate.
31
101844
2355
就能得知这个人的心率。
01:57
And if we used the same相同 software软件
on the right video视频,
32
105095
3065
而用这一软件分析右边的视频,
02:00
it lets让我们 us see each breath呼吸
that this infant婴儿 takes,
33
108160
3227
我们就能看清婴儿的每一次呼吸,
02:03
and we can use this as a contact-free无接触 way
to monitor监控 her breathing呼吸.
34
111387
4137
不需要触碰就能监控她的呼吸。
02:08
And so this technology技术 is really powerful强大
because it takes these phenomena现象
35
116884
5348
这项技术非常强大,
因为它能帮助我们看到
02:14
that we normally一般 have
to experience经验 through通过 touch触摸
36
122232
2367
原本要靠触觉才能感受到的东西,
02:16
and it lets让我们 us capture捕获 them visually视觉
and non-invasively非侵入性.
37
124599
2957
并且这一过程是可见和无创的。
02:21
So a couple一对 years年份 ago, I started开始 working加工
with the folks乡亲 that created创建 that software软件,
38
129104
4411
因此在几年前,我开始
与这个软件的编写者们一起工作,
02:25
and we decided决定 to pursue追求 a crazy idea理念.
39
133515
3367
我们产生了一个疯狂的想法。
02:28
We thought, it's cool
that we can use software软件
40
136882
2693
我们觉得,运用软件将细微的动作
02:31
to visualize想象 tiny motions运动 like this,
41
139575
3135
可视化的这个点子非常酷,
02:34
and you can almost几乎 think of it
as a way to extend延伸 our sense of touch触摸.
42
142710
4458
你甚至可以把它当做拓展
人类触觉感官的好方法。
02:39
But what if we could do the same相同 thing
with our ability能力 to hear?
43
147168
4059
那如果我们能用相同的方法
来增强我们的听觉呢?
02:44
What if we could use video视频
to capture捕获 the vibrations振动 of sound声音,
44
152508
4665
如果我们能通过视频捕捉到声音的振动,
02:49
which哪一个 are just another另一个 kind of motion运动,
45
157173
2827
声音的振动实际上也是一种运动,
02:52
and turn everything that we see
into a microphone麦克风?
46
160000
3346
将“看到”的东西录入麦克风呢?
02:56
Now, this is a bit of a strange奇怪 idea理念,
47
164236
1971
也许听起来有点不太好理解,
02:58
so let me try to put it
in perspective透视 for you.
48
166207
2586
我试着为大家解释一下。
03:01
Traditional传统 microphones麦克风
work by converting转换 the motion运动
49
169523
3488
传统麦克风的工作原理
03:05
of an internal内部 diaphragm光圈
into an electrical电动 signal信号,
50
173011
3599
是将其内部薄膜的振动转换成电信号,
03:08
and that diaphragm光圈 is designed设计
to move移动 readily容易 with sound声音
51
176610
4318
这个薄膜极易随声音振动,
03:12
so that its motion运动 can be recorded记录
and interpreted解读 as audio音频.
52
180928
4807
这个振动可以被记录下来
并还原成声音。
03:17
But sound声音 causes原因 all objects对象 to vibrate颤动.
53
185735
3668
而声音事实上可以
引起任何物体的振动。
03:21
Those vibrations振动 are just usually平时
too subtle微妙 and too fast快速 for us to see.
54
189403
5480
只不过这种振动对我们而言
通常很细微而且转瞬即逝。
03:26
So what if we record记录 them
with a high-speed高速 camera相机
55
194883
3738
但如果我们用高速摄影机
将这种振动录下来,
03:30
and then use software软件
to extract提取 tiny motions运动
56
198621
3576
并通过软件从这些高速视频中
03:34
from our high-speed高速 video视频,
57
202197
2090
提取出这些细小的振动,
03:36
and analyze分析 those motions运动 to figure数字 out
what sounds声音 created创建 them?
58
204287
4274
然后分析这些振动来
弄清声音的来源,会怎么样呢?
03:41
This would let us turn visible可见 objects对象
into visual视觉 microphones麦克风 from a distance距离.
59
209859
5449
这样一来我们可以将远处的
可见物体转化为可视化麦克风。
03:49
And so we tried试着 this out,
60
217080
2183
我们进行了各种尝试,
03:51
and here's这里的 one of our experiments实验,
61
219263
1927
以下是我们的试验之一,
右边是一株盆栽植物,
03:53
where we took this potted盆栽 plant
that you see on the right
62
221190
2949
03:56
and we filmed拍摄 it with a high-speed高速 camera相机
63
224139
2438
我们用高速摄影机拍下它,
03:58
while a nearby附近 loudspeaker喇叭
played发挥 this sound声音.
64
226577
3529
同时旁边的音箱在播放这个声音。
04:02
(Music音乐: "Mary玛丽 Had a Little Lamb羊肉")
65
230275
8190
(音乐:玛丽有一只小羊羔)
04:11
And so here's这里的 the video视频 that we recorded记录,
66
239820
2824
这是我们录下的视频,
04:14
and we recorded记录 it at thousands数千
of frames per second第二,
67
242644
3924
用的是每秒数千帧的速度,
04:18
but even if you look very closely密切,
68
246568
2322
但即使你凑得非常近,
04:20
all you'll你会 see are some leaves树叶
69
248890
1951
也只能看到一些叶子
04:22
that are pretty漂亮 much
just sitting坐在 there doing nothing,
70
250841
3065
静静地呆在那儿,一动不动,
04:25
because our sound声音 only moved移动 those leaves树叶
by about a micrometer千分尺.
71
253906
4806
因为刚才的音乐
只能让叶子移动一微米,
04:31
That's one ten-thousandth万分之一 of a centimeter厘米,
72
259103
4276
也就是一厘米的万分之一,
04:35
which哪一个 spans跨度 somewhere某处 between之间
a hundredth第一百 and a thousandth千分之一
73
263379
4156
只占这幅图像中一个像素的
04:39
of a pixel像素 in this image图片.
74
267535
2299
百分之一到千分之一。
04:41
So you can squint all you want,
75
269881
2887
你大可以眯着眼使劲儿看,
04:44
but motion运动 that small is pretty漂亮 much
perceptually感知 invisible无形.
76
272768
3335
但如此细微的运动
从感官上来说是不可见的。
04:49
But it turns out that something
can be perceptually感知 invisible无形
77
277667
4157
但事实证明感官上不可见的东西
04:53
and still be numerically数字 significant重大,
78
281824
2809
在数值上可能很惊人,
04:56
because with the right algorithms算法,
79
284633
2002
因为通过正确的算法,
04:58
we can take this silent无声,
seemingly似乎 still video视频
80
286635
3687
我们就可以从这段无声的
看似静止的视频中
05:02
and we can recover恢复 this sound声音.
81
290322
1527
还原出这段声音。
05:04
(Music音乐: "Mary玛丽 Had a Little Lamb羊肉")
82
292690
7384
(音乐:玛丽有一只小羊羔)
05:12
(Applause掌声)
83
300074
5828
(掌声)
05:22
So how is this possible可能?
84
310058
1939
这怎么可能呢?
05:23
How can we get so much information信息
out of so little motion运动?
85
311997
4344
我们怎么能从如此细小的运动中
得到如此丰富的信息?
05:28
Well, let's say that those leaves树叶
move移动 by just a single micrometer千分尺,
86
316341
5361
我们必须承认这些叶子
只移动了一微米,
05:33
and let's say that that shifts转变 our image图片
by just a thousandth千分之一 of a pixel像素.
87
321702
4308
只改变了图像中一个像素的千分之一。
05:39
That may可能 not seem似乎 like much,
88
327269
2572
看起来很微不足道,
05:41
but a single frame of video视频
89
329841
1996
但是视频中的每一帧
05:43
may可能 have hundreds数以百计 of thousands数千
of pixels像素 in it,
90
331837
3257
都包含数以万计的像素,
05:47
and so if we combine结合 all
of the tiny motions运动 that we see
91
335094
3454
当我们将整幅画面中
所有细微的运动
05:50
from across横过 that entire整个 image图片,
92
338548
2298
组合在一起来看的时候,
05:52
then suddenly突然 a thousandth千分之一 of a pixel像素
93
340846
2623
无数个千分之一像素聚在一起
05:55
can start开始 to add up
to something pretty漂亮 significant重大.
94
343469
2775
就能组合出有十分意义的信息。
05:58
On a personal个人 note注意, we were pretty漂亮 psyched激动
when we figured想通 this out.
95
346870
3635
老实说,当我们想通
这一点的时候真是乐疯了。
06:02
(Laughter笑声)
96
350505
2320
(笑声)
06:04
But even with the right algorithm算法,
97
352825
3253
但是,即便运用正确的算法
06:08
we were still missing失踪
a pretty漂亮 important重要 piece of the puzzle难题.
98
356078
3617
我们还是会丢失掉很多重要的信息。
06:11
You see, there are a lot of factors因素
that affect影响 when and how well
99
359695
3604
这项技术能否成功
06:15
this technique技术 will work.
100
363299
1997
取决于很多因素。
06:17
There's the object目的 and how far away it is;
101
365296
3204
比如目标物体的距离;
06:20
there's the camera相机
and the lens镜片 that you use;
102
368500
2394
摄影机和镜头的选用;
06:22
how much light is shining闪亮的 on the object目的
and how loud your sound声音 is.
103
370894
4091
光线是否充足,
声音是否够大等等。
06:27
And even with the right algorithm算法,
104
375945
3375
因此,即便我们的算法正确,
06:31
we had to be very careful小心
with our early experiments实验,
105
379320
3390
在早期试验中
我们还是得万分谨慎,
06:34
because if we got
any of these factors因素 wrong错误,
106
382710
2392
因为一着不慎,满盘皆输,
06:37
there was no way to tell
what the problem问题 was.
107
385102
2368
得不到有用的信息,
也查不出原因。
06:39
We would just get noise噪声 back.
108
387470
2647
还原出来的只有噪音。
06:42
And so a lot of our early
experiments实验 looked看着 like this.
109
390117
3320
初期的试验场景是这样的。
06:45
And so here I am,
110
393437
2206
左边的是我,
06:47
and on the bottom底部 left, you can kind of
see our high-speed高速 camera相机,
111
395643
4040
左下角是我们的高速摄影机,
06:51
which哪一个 is pointed at a bag of chips芯片,
112
399683
2183
正对着一袋薯片,
06:53
and the whole整个 thing is lit发光的
by these bright lamps灯具.
113
401866
2949
薯片被一盏明亮的灯照着。
06:56
And like I said, we had to be
very careful小心 in these early experiments实验,
114
404815
4365
就像刚才我说的,
在初期试验中我们需要十分小心,
07:01
so this is how it went down.
115
409180
2508
得有多小心呢?请看。
07:03
(Video视频) Abe安倍晋三 Davis戴维斯: Three, two, one, go.
116
411688
3761
(视频:三、二、一,开始)
07:07
Mary玛丽 had a little lamb羊肉!
Little lamb羊肉! Little lamb羊肉!
117
415449
5387
(视频:玛丽有一只小羊羔!
小羊羔!小羊羔!)
07:12
(Laughter笑声)
118
420836
4500
(笑声)
07:17
AD广告: So this experiment实验
looks容貌 completely全然 ridiculous荒谬.
119
425336
2814
这试验看起来真是弱爆了。
07:20
(Laughter笑声)
120
428150
1788
(笑声)
07:21
I mean, I'm screaming尖叫 at a bag of chips芯片 --
121
429938
2345
我可是对着一袋薯片在咆哮——
07:24
(Laughter笑声) --
122
432283
1551
(笑声)
07:25
and we're blasting爆破 it with so much light,
123
433834
2117
而且我们用的灯功率太大,
07:27
we literally按照字面 melted融化了 the first bag
we tried试着 this on. (Laughter笑声)
124
435951
4479
差点把第一袋薯片点着了。
(笑声)
虽然看起来很不靠谱,
07:32
But ridiculous荒谬 as this experiment实验 looks容貌,
125
440525
3274
07:35
it was actually其实 really important重要,
126
443799
1788
但结果还是不错的,
07:37
because we were able能够
to recover恢复 this sound声音.
127
445587
2926
因为我们最终还原出了这段声音。
07:40
(Audio音频) Mary玛丽 had a little lamb羊肉!
Little lamb羊肉! Little lamb羊肉!
128
448513
4712
(音频:玛丽有一只小羊羔!
小羊羔!小羊羔!)
07:45
(Applause掌声)
129
453225
4088
(掌声)
07:49
AD广告: And this was really significant重大,
130
457313
1881
这绝对是一个里程碑,
07:51
because it was the first time
we recovered恢复 intelligible明了的 human人的 speech言语
131
459194
4119
因为这是我们第一次
从一段无声录像中
07:55
from silent无声 video视频 of an object目的.
132
463424
2341
还原出具有意义的人声。
07:57
And so it gave us this point of reference参考,
133
465765
2391
因此我们以此为出发点
08:00
and gradually逐渐 we could start开始
to modify修改 the experiment实验,
134
468156
3871
不断修正我们的试验,
08:04
using运用 different不同 objects对象
or moving移动 the object目的 further进一步 away,
135
472106
3805
更换试验对象,调整距离,
08:07
using运用 less light or quieter安静 sounds声音.
136
475911
2770
减小光线强度,降低声音等等。
08:11
And we analyzed分析 all of these experiments实验
137
479887
2874
我们不断分析试验结果,
08:14
until直到 we really understood了解
the limits范围 of our technique技术,
138
482761
3622
直到发现这一技术的局限性,
08:18
because once一旦 we understood了解 those limits范围,
139
486383
1950
因为只有搞清楚局限在哪儿
08:20
we could figure数字 out how to push them.
140
488333
2346
我们才能不断取得突破。
08:22
And that led to experiments实验 like this one,
141
490679
3181
于是,就有了下面这个试验,
08:25
where again, I'm going to speak说话
to a bag of chips芯片,
142
493860
2739
这一次,我还是对着一袋薯片说话,
08:28
but this time we've我们已经 moved移动 our camera相机
about 15 feet away,
143
496599
4830
但将摄影机后退到了15英尺
(4.572米)远的室外,
08:33
outside, behind背后 a soundproof隔音 window窗口,
144
501429
2833
隔着一层隔音玻璃,
08:36
and the whole整个 thing is lit发光的
by only natural自然 sunlight阳光.
145
504262
2803
只借助自然光线。
08:40
And so here's这里的 the video视频 that we captured捕获.
146
508529
2155
这是我们拍下的视频。
08:44
And this is what things sounded满面 like
from inside, next下一个 to the bag of chips芯片.
147
512450
4559
这是在室内,
在薯片旁说话的原声。
08:49
(Audio音频) Mary玛丽 had a little lamb羊肉
whose谁的 fleece羊毛 was white白色 as snow,
148
517009
5038
(音频:玛丽有一只小羊羔,
身上羊毛白又好,
08:54
and everywhere到处 that Mary玛丽 went,
that lamb羊肉 was sure to go.
149
522047
5619
无论玛丽走到哪,
小羊都会跟着跑。)
08:59
AD广告: And here's这里的 what we were able能够
to recover恢复 from our silent无声 video视频
150
527666
4017
这是通过我们从室外
隔音玻璃后采集的无声影像
09:03
captured捕获 outside behind背后 that window窗口.
151
531683
2345
还原出来的声音。
09:06
(Audio音频) Mary玛丽 had a little lamb羊肉
whose谁的 fleece羊毛 was white白色 as snow,
152
534028
4435
(音频:玛丽有一只小羊羔,
身上羊毛白又好,
09:10
and everywhere到处 that Mary玛丽 went,
that lamb羊肉 was sure to go.
153
538463
5457
无论玛丽走到哪,
小羊都会跟着跑。)
09:15
(Applause掌声)
154
543920
6501
(掌声)
09:22
AD广告: And there are other ways方法
that we can push these limits范围 as well.
155
550421
3542
我们还调整了其它参数。
09:25
So here's这里的 a quieter安静 experiment实验
156
553963
1798
比如说降低音量,
09:27
where we filmed拍摄 some earphones耳机
plugged into a laptop笔记本电脑 computer电脑,
157
555761
4110
这有一副耳机,插在笔记本电脑上,
09:31
and in this case案件, our goal目标 was to recover恢复
the music音乐 that was playing播放 on that laptop笔记本电脑
158
559871
4110
在这个实验中,我们想仅通过拍摄下
这对塑料耳机的
09:35
from just silent无声 video视频
159
563981
2299
无声视频来还原
09:38
of these two little plastic塑料 earphones耳机,
160
566280
2507
笔记本里播放的音乐,
09:40
and we were able能够 to do this so well
161
568787
2183
结果很理想,
09:42
that I could even ShazamShazam的 our results结果.
162
570970
2461
我甚至能用Shazam
来识别出这段音乐。
09:45
(Laughter笑声)
163
573431
2411
(笑声)
09:49
(Music音乐: "Under Pressure压力" by Queen女王)
164
577191
10034
(音乐:“皇后乐队”的《重压之下》)
10:01
(Applause掌声)
165
589615
4969
(掌声)
10:06
And we can also push things
by changing改变 the hardware硬件 that we use.
166
594584
4551
我们还尝试了更换试验设备
来完善我们的成果。
10:11
Because the experiments实验
I've shown显示 you so far
167
599135
2461
因为前面我给大家展示的试验
10:13
were doneDONE with a camera相机,
a high-speed高速 camera相机,
168
601596
2322
都是通过高速摄影机完成的,
10:15
that can record记录 video视频
about a 100 times faster更快
169
603918
2879
它的拍摄速度比大多数手机摄像头
10:18
than most cell细胞 phones手机,
170
606797
1927
快100倍,
10:20
but we've我们已经 also found发现 a way
to use this technique技术
171
608724
2809
但是我们也找到了用普通摄影机
10:23
with more regular定期 cameras相机,
172
611533
2230
来完成试验的方法,
10:25
and we do that by taking服用 advantage优点
of what's called a rolling压延 shutter快门.
173
613763
4069
我们利用了叫做“滚动快门”的技术。
10:29
You see, most cameras相机
record记录 images图片 one row at a time,
174
617832
4798
大部分摄像头是逐行拍摄影像的,
10:34
and so if an object目的 moves移动
during the recording记录 of a single image图片,
175
622630
5702
因此如果在拍摄单张照片时
物体发生了移动,
10:40
there's a slight轻微 time delay延迟
between之间 each row,
176
628344
2717
每一行影像间就会出现少许延迟,
10:43
and this causes原因 slight轻微 artifacts文物
177
631061
3157
这种延迟使得视频的每一帧
10:46
that get coded编码 into each frame of a video视频.
178
634218
3483
都会产生轻微的变形。
10:49
And so what we found发现
is that by analyzing分析 these artifacts文物,
179
637701
3806
通过分析这种变形,
10:53
we can actually其实 recover恢复 sound声音
using运用 a modified改性 version of our algorithm算法.
180
641507
4615
运用调整过的算法
我们还是可以还原声音。
10:58
So here's这里的 an experiment实验 we did
181
646122
1912
在接下来这个试验里,
11:00
where we filmed拍摄 a bag of candy糖果
182
648034
1695
我们拍摄的是一袋糖果,
11:01
while a nearby附近 loudspeaker喇叭 played发挥
183
649729
1741
旁边的喇叭里播放的
11:03
the same相同 "Mary玛丽 Had a Little Lamb羊肉"
music音乐 from before,
184
651470
2972
还是之前那首“玛丽有一只小羊羔”,
11:06
but this time, we used just a regular定期
store-bought商店购买 camera相机,
185
654442
4203
但这一次我们使用的是
能在店里买到的普通摄影机,
11:10
and so in a second第二, I'll play for you
the sound声音 that we recovered恢复,
186
658645
3174
下面请听我们还原出来的声音,
这次的声音有些失真,
11:13
and it's going to sound声音
distorted扭曲 this time,
187
661819
2050
11:15
but listen and see if you can still
recognize认识 the music音乐.
188
663869
2836
但仔细听一下,
看你能否分辨出来这段音乐。
11:19
(Audio音频: "Mary玛丽 Had a Little Lamb羊肉")
189
667723
6223
(音频:玛丽有一只小羊羔)
11:37
And so, again, that sounds声音 distorted扭曲,
190
685527
3465
就是这样,听起来有点失真,
11:40
but what's really amazing惊人 here
is that we were able能够 to do this
191
688992
4386
但别忘了
我们这次用的是普通摄影机,
11:45
with something
that you could literally按照字面 run out
192
693378
2626
你随便到一家百思买
这样的电器商店
11:48
and pick up at a Best最好 Buy购买.
193
696004
1444
就可以买到。
11:51
So at this point,
194
699122
1363
那么目前为止,
11:52
a lot of people see this work,
195
700485
1974
相信许多人看到这儿
11:54
and they immediately立即 think
about surveillance监控.
196
702459
3413
立刻想到了监听。
11:57
And to be fair公平,
197
705872
2415
说实话,
12:00
it's not hard to imagine想像 how you might威力 use
this technology技术 to spy间谍 on someone有人.
198
708287
4133
用这个技术去监听
还真不是什么难事。
12:04
But keep in mind心神 that there's already已经
a lot of very mature成熟 technology技术
199
712420
3947
但请大家注意,
早就有很多成熟的技术
12:08
out there for surveillance监控.
200
716367
1579
被用于监听了。
12:09
In fact事实, people have been using运用 lasers激光器
201
717946
2090
实际上,将激光投射在物体上
12:12
to eavesdrop窃听 on objects对象
from a distance距离 for decades几十年.
202
720036
2799
进行远距离监听的技术
已经出现几十年了。
12:15
But what's really new here,
203
723978
2025
但我们这项技术的创新之处,
12:18
what's really different不同,
204
726003
1440
与众不同之处
12:19
is that now we have a way
to picture图片 the vibrations振动 of an object目的,
205
727443
4295
在于我们掌握了一种
描绘物体振动的方法,
12:23
which哪一个 gives us a new lens镜片
through通过 which哪一个 to look at the world世界,
206
731738
3413
使我们能通过一种全新的镜头
去看这个世界。
12:27
and we can use that lens镜片
207
735151
1510
通过这个镜头,
12:28
to learn学习 not just about forces军队 like sound声音
that cause原因 an object目的 to vibrate颤动,
208
736661
4899
不仅能看清使物体产生振动的外力,
比如声音,
12:33
but also about the object目的 itself本身.
209
741560
2288
还能了解物体本身的性质。
12:36
And so I want to take a step back
210
744975
1693
因此我想换个角度
12:38
and think about how that might威力 change更改
the ways方法 that we use video视频,
211
746668
4249
思考这将如何改变
我们使用视频的方式,
12:42
because we usually平时 use video视频
to look at things,
212
750917
3553
我们通常用视频来“看”东西,
12:46
and I've just shown显示 you how we can use it
213
754470
2322
而我刚刚给大家展示的是如何用视频
12:48
to listen to things.
214
756792
1857
来“听”东西。
12:50
But there's another另一个 important重要 way
that we learn学习 about the world世界:
215
758649
3971
但是还有一种认识世界的重要方式,
12:54
that's by interacting互动 with it.
216
762620
2275
就是与世界互动。
12:56
We push and pull and poke and prod things.
217
764895
3111
我们可以移动或触碰某个物体。
13:00
We shake things and see what happens发生.
218
768006
3181
或者摇晃它,看它会发生什么变化。
13:03
And that's something that video视频
still won't惯于 let us do,
219
771187
4273
但这一变化(可能太过微小)
视频没法捕捉,
13:07
at least最小 not traditionally传统.
220
775460
2136
至少用传统的方式实现不了。
13:09
So I want to show显示 you some new work,
221
777596
1950
因此我想向大家展示一项新的成果,
13:11
and this is based基于 on an idea理念 I had
just a few少数 months个月 ago,
222
779546
2667
这项成果基于我几个月前的一个想法,
13:14
so this is actually其实 the first time
I've shown显示 it to a public上市 audience听众.
223
782213
3301
今天其实是我第一次将它公之于众。
13:17
And the basic基本 idea理念 is that we're going
to use the vibrations振动 in a video视频
224
785514
5363
简而言之就是,
我们会利用视频里的振动
13:22
to capture捕获 objects对象 in a way
that will let us interact相互作用 with them
225
790877
4481
来与物体进行互动,
13:27
and see how they react应对 to us.
226
795358
1974
然后看物体如何反应。
13:31
So here's这里的 an object目的,
227
799120
1764
这是我们的试验对象,
13:32
and in this case案件, it's a wire线 figure数字
in the shape形状 of a human人的,
228
800884
3832
一个用铁丝做成的小人,
13:36
and we're going to film电影 that object目的
with just a regular定期 camera相机.
229
804716
3088
我们使用的是一台普通的摄影机。
13:39
So there's nothing special特别
about this camera相机.
230
807804
2124
没有任何特别之处。
13:41
In fact事实, I've actually其实 doneDONE this
with my cell细胞 phone电话 before.
231
809928
2961
实际上,我用手机也能做到。
13:44
But we do want to see the object目的 vibrate颤动,
232
812889
2252
但如果我们想让这个小人振动,
13:47
so to make that happen发生,
233
815141
1133
要怎么做呢,
13:48
we're just going to bang a little bit
on the surface表面 where it's resting休息
234
816274
3346
我们仅仅在放置小人的
台子上敲了几下,
13:51
while we record记录 this video视频.
235
819620
2138
并把过程拍了下来。
13:59
So that's it: just five seconds
of regular定期 video视频,
236
827398
3671
就这样,我们得到了一段
五秒钟的普通视频,
14:03
while we bang on this surface表面,
237
831069
2136
敲了几下台子,
14:05
and we're going to use
the vibrations振动 in that video视频
238
833205
3513
我们将利用视频里的振动
14:08
to learn学习 about the structural结构
and material材料 properties性能 of our object目的,
239
836718
4544
来研究这个小人的
结构特征和材料特征,
14:13
and we're going to use that information信息
to create创建 something new and interactive互动.
240
841262
4834
并利用这些信息
创造出一种新的具有互动性的东西。
14:24
And so here's这里的 what we've我们已经 created创建.
241
852866
2653
这就是我们的成果
14:27
And it looks容貌 like a regular定期 image图片,
242
855519
2229
看起来像一张普通的图片,
14:29
but this isn't an image图片,
and it's not a video视频,
243
857748
3111
但这不是图片,
也不是视频,
14:32
because now I can take my mouse老鼠
244
860859
2368
因为我可以移动鼠标
14:35
and I can start开始 interacting互动
with the object目的.
245
863227
2859
与这个小人进行互动。
14:44
And so what you see here
246
872936
2357
现在大家看到的
14:47
is a simulation模拟 of how this object目的
247
875389
2226
是模拟小人在受到外力时
14:49
would respond响应 to new forces军队
that we've我们已经 never seen看到 before,
248
877615
4458
会如何反应,
即使这种外力是初次施加的,
14:54
and we created创建 it from just
five seconds of regular定期 video视频.
249
882073
3633
而这都来源于那
短短五秒钟的普通视频。
14:59
(Applause掌声)
250
887249
4715
(掌声)
15:09
And so this is a really powerful强大
way to look at the world世界,
251
897421
3227
这的确是一种审视世界的有效方法,
15:12
because it lets让我们 us predict预测
how objects对象 will respond响应
252
900648
2972
让我们可以预测物体在新的条件下
15:15
to new situations情况,
253
903620
1823
会作何反应,
15:17
and you could imagine想像, for instance,
looking at an old bridge
254
905443
3473
想象一下,前面有一座很旧的桥,
15:20
and wondering想知道 what would happen发生,
how would that bridge hold保持 up
255
908916
3527
我们不知道它是否足够结实,
15:24
if I were to drive驾驶 my car汽车 across横过 it.
256
912443
2833
我们能不能把车开过去。
15:27
And that's a question
that you probably大概 want to answer回答
257
915276
2774
而这种问题
最好在你开车上桥之前
15:30
before you start开始 driving主动
across横过 that bridge.
258
918050
2560
就搞清楚答案。
15:33
And of course课程, there are going to be
limitations限制 to this technique技术,
259
921988
3272
当然,这项技术有它的局限,
15:37
just like there were
with the visual视觉 microphone麦克风,
260
925260
2462
就像之前的视觉麦克风试验一样,
15:39
but we found发现 that it works作品
in a lot of situations情况
261
927722
3181
但我们也发现
它能在许多场景下发挥作用,
15:42
that you might威力 not expect期望,
262
930903
1875
有时甚至出乎你的意料,
15:44
especially特别 if you give it longer videos视频.
263
932778
2768
特别是当视频时间足够长的时候。
15:47
So for example,
here's这里的 a video视频 that I captured捕获
264
935546
2508
举个例子,这段视频
15:50
of a bush衬套 outside of my apartment公寓,
265
938054
2299
拍的是我公寓外的灌木丛,
15:52
and I didn't do anything to this bush衬套,
266
940353
3088
我没有动过它,
15:55
but by capturing捕获 a minute-long分钟长 video视频,
267
943441
2705
只是拍了一段1分钟长的视频,
15:58
a gentle温和 breeze微风 caused造成 enough足够 vibrations振动
268
946146
3378
微风不断吹动灌木,
16:01
that we could learn学习 enough足够 about this bush衬套
to create创建 this simulation模拟.
269
949524
3587
让我能够收集到足够的信息
来完成这段模拟。
16:07
(Applause掌声)
270
955270
6142
(掌声)
16:13
And so you could imagine想像 giving this
to a film电影 director导向器,
271
961412
2972
想象一下,
如果电影导演掌握了这项技术,
16:16
and letting出租 him control控制, say,
272
964384
1719
他就可以在后期制作时
16:18
the strength强度 and direction方向 of wind
in a shot射击 after it's been recorded记录.
273
966103
4922
随心所欲地控制风的大小和方向。
16:24
Or, in this case案件, we pointed our camera相机
at a hanging curtain窗帘,
274
972810
4535
来看另一个例子,
我们拍摄了一副挂起来的窗帘,
16:29
and you can't even see
any motion运动 in this video视频,
275
977345
4129
在这段视频里
你甚至看不出来窗帘在动,
16:33
but by recording记录 a two-minute-long两分钟长 video视频,
276
981474
2925
但是利用2分钟长的一段视频,
16:36
natural自然 air空气 currents电流 in this room房间
277
984399
2438
仅仅靠房间里的自然空气流动
16:38
created创建 enough足够 subtle微妙,
imperceptible难以察觉 motions运动 and vibrations振动
278
986837
4412
引发的无法察觉的动作和振动,
16:43
that we could learn学习 enough足够
to create创建 this simulation模拟.
279
991249
2565
就能使我们提取出足够多的
信息来完成这段模拟。
16:48
And ironically讽刺地,
280
996243
2366
神奇的是,
16:50
we're kind of used to having
this kind of interactivity互动
281
998609
3088
以往我们都是针对虚拟物体,
16:53
when it comes to virtual虚拟 objects对象,
282
1001697
2647
针对游戏和3D模型
16:56
when it comes to video视频 games游戏
and 3D models楷模,
283
1004344
3297
来实现这种互动,
16:59
but to be able能够 to capture捕获 this information信息
from real真实 objects对象 in the real真实 world世界
284
1007641
4404
而这项技术仅仅是利用
普通的视频
17:04
using运用 just simple简单, regular定期 video视频,
285
1012045
2817
对现实世界中的
真实物体进行采样,
17:06
is something new that has
a lot of potential潜在.
286
1014862
2183
它极富新意,
具有广阔的应用前景。
17:10
So here are the amazing惊人 people
who worked工作 with me on these projects项目.
287
1018410
4904
这些是跟我共同研究
这项技术的优秀的同事。
17:16
(Applause掌声)
288
1024057
5596
(掌声)
17:24
And what I've shown显示 you today今天
is only the beginning开始.
289
1032819
3057
今天向大家展示的
只是一个技术雏形。
17:27
We've我们已经 just started开始 to scratch the surface表面
290
1035876
2113
关于如何使用这种新型图像,
17:29
of what you can do
with this kind of imaging成像,
291
1037989
2972
我们才刚刚入门,
17:32
because it gives us a new way
292
1040961
2286
它为我们提供了一种
17:35
to capture捕获 our surroundings环境
with common共同, accessible无障碍 technology技术.
293
1043342
4724
运用已有的普通技术
来记录周围事物的新方法。
17:40
And so looking to the future未来,
294
1048066
1929
展望一下未来,
17:41
it's going to be
really exciting扣人心弦 to explore探索
295
1049995
2037
我们迫不及待地想要看到如何
17:44
what this can tell us about the world世界.
296
1052032
1856
利用这项技术去更好地了解世界。
17:46
Thank you.
297
1054381
1204
谢谢大家。
17:47
(Applause掌声)
298
1055610
6107
(掌声)
Translated by Alvin Lee
Reviewed by Lee Li

▲Back to top

ABOUT THE SPEAKER
Abe Davis - Computer scientist
Computer vision expert Abe Davis pioneers methods to extract audio from silent digital videos, even footage shot on ordinary consumer cameras.

Why you should listen

MIT PhD student, computer vision wizard and rap artist Abe Davis has co-created the world’s most improbable audio instrument.  In 2014, Davis and his collaborators debuted the “visual microphone,” an algorithm that samples the sympathetic vibrations of ordinary objects (such as a potato chip bag) from ordinary high-speed video footage and transduces them into intelligible audio tracks.

Davis is also the author of Caperture, a 3D-imaging app designed to create and share 3D images on any compatible smartphone.

More profile about the speaker
Abe Davis | Speaker | TED.com