ABOUT THE SPEAKER

Fei-Fei Li - Computer scientist
As Director of Stanford’s Artificial Intelligence Lab and Vision Lab, Fei-Fei Li is working to solve AI’s trickiest problems -- including image recognition, learning and language processing.

Why you should listen

Using algorithms built on machine learning methods such as neural network models, the Stanford Artificial Intelligence Lab led by Fei-Fei Li has created software capable of recognizing scenes in still photographs -- and accurately describe them using natural language.

Li’s work with neural networks and computer vision (with Stanford’s Vision Lab) marks a significant step forward for AI research, and could lead to applications ranging from more intuitive image searches to robots able to make autonomous decisions in unfamiliar situations.

Fei-Fei was honored as one of Foreign Policy's 2015 Global Thinkers.

More profile about the speaker
Fei-Fei Li | Speaker | TED.com

TED2015

Fei-Fei Li: How we're teaching computers to understand pictures

李飞飞: 我们怎么教计算机理解图片？

Filmed: 2015-03-17

Readability: 4.5

2,702,344 views

当一个非常小的孩子看到图片时，她可以辨认出里面简单的要素："猫""书""椅子"。现在，电脑也聪明得可以做同样的工作了。接下来呢？在这个令人震撼的演讲里，计算机视觉专家李飞飞介绍了这一技术的发展现状——包括她的团队为了"教"计算机理解图片所建立的一千五百万照片的数据库——而关键性的要点还不止于此。

Fei-Fei Li - Computer scientist
As Director of Stanford’s Artificial Intelligence Lab and Vision Lab, Fei-Fei Li is working to solve AI’s trickiest problems -- including image recognition, learning and language processing. Full bio

Double-click the English transcript below to play the video.

00:14

Let me show显示 you something.

0

2366

3738

我先来给你们看点东西。

（视频）女孩：
好吧，这是只猫，坐在床上。

00:18

(Video视频) Girl女孩: Okay, that's a cat猫
sitting坐在 in a bed床.

1

6104

4156

一个男孩摸着一头大象。

00:22

The boy男孩 is petting抚摸 the elephant象.

2

10260

4040

那些人正准备登机。

00:26

Those are people
that are going on an airplane飞机.

3

14300

4354

00:30

That's a big大 airplane飞机.

4

18654

2810

那是架大飞机。

李飞飞：
这是一个三岁的小孩

00:33

Fei-Fei菲菲 Li里: This is
a three-year-old三十岁 child儿童

5

21464

2206

在讲述她从一系列照片里看到的东西。

00:35

describing说明 what she sees看到
in a series系列 of photos相片.

6

23670

3679

00:39

She might威力 still have a lot
to learn学习 about this world世界,

7

27349

2845

对这个世界，
她也许还有很多要学的东西，

00:42

but she's already已经 an expert专家
at one very important重要 task任务:

8

30194

4549

但在一个重要的任务上，
她已经是专家了：

00:46

to make sense感 of what she sees看到.

9

34743

2846

去理解她所看到的东西。

我们的社会已经在科技上
取得了前所未有的进步。

00:50

Our society社会 is more
technologically技术 advanced高级 than ever.

10

38229

4226

00:54

We send发送 people to the moon月亮,
we make phones手机 that talk to us

11

42455

3629

我们把人送上月球，
我们制造出可以与我们对话的手机，

00:58

or customize定制 radio无线电 stations站
that can play玩 only music音乐 we like.

12

46084

4946

或者订制一个音乐电台，
播放的全是我们喜欢的音乐。

01:03

Yet然而, our most最 advanced高级
machines机 and computers电脑

13

51030

4055

然而，哪怕是我们最先进的机器和电脑

01:07

still struggle斗争 at this task任务.

14

55085

2903

也会在这个问题上犯难。

01:09

So I'm here today今天
to give you a progress进展 report报告

15

57988

3459

所以今天我在这里，
向大家做个进度汇报：

01:13

on the latest最新 advances进步
in our research研究 in computer电脑 vision视力,

16

61447

4047

关于我们在计算机
视觉方面最新的研究进展。

01:17

one of the most最 frontier边境
and potentially可能 revolutionary革命的

17

65494

4161

这是计算机科学领域最前沿的、

01:21

technologies技术 in computer电脑 science科学.

18

69655

3206

具有革命性潜力的科技。

01:24

Yes, we have prototyped原型 cars汽车
that can drive驾驶 by themselves他们自己,

19

72861

4551

是的，我们现在已经有了
具备自动驾驶功能的原型车，

01:29

but without无 smart聪明 vision视力,
they cannot不能 really tell the difference区别

20

77412

3853

但是如果没有敏锐的视觉，
它们就不能真正区分出

01:33

between之间 a crumpled皱巴巴 paper纸 bag袋
on the road路, which哪一个 can be run跑 over,

21

81265

3970

地上摆着的是一个压扁的纸袋，
可以被轻易压过，

01:37

and a rock岩 that size尺寸,
which哪一个 should be avoided避免.

22

85235

3340

还是一块相同体积的石头，
应该避开。

01:41

We have made制作 fabulous极好 megapixel百万像素 cameras相机,

23

89415

3390

我们已经造出了超高清的相机，

01:44

but we have not delivered交付
sight视力 to the blind盲.

24

92805

3135

但我们仍然无法把
这些画面传递给盲人。

01:48

Drones无人机 can fly飞 over massive大规模的 land土地,

25

96420

3305

我们的无人机可以飞跃广阔的土地，

01:51

but don't have enough足够 vision视力 technology技术

26

99725

2134

却没有足够的视觉技术

去帮我们追踪热带雨林的变化。

01:53

to help us to track跟踪
the changes变化 of the rainforests热带雨林.

27

101859

3461

01:57

Security安全 cameras相机 are everywhere到处,

28

105320

2950

安全摄像头到处都是，

02:00

but they do not alert警报 us when a child儿童
is drowning溺死 in a swimming游泳的 pool池.

29

108270

5067

但当有孩子在泳池里溺水时
它们无法向我们报警。

02:06

Photos相片 and videos视频 are becoming变得
an integral积分 part部分 of global全球 life.

30

114167

5595

照片和视频，已经成为
全人类生活里不可缺少的部分。

02:11

They're being存在 generated产生 at a pace步伐
that's far远 beyond外 what any human人的,

31

119762

4087

它们以极快的速度被创造出来，
以至于没有任何人，或者团体，

02:15

or teams球队 of humans人类, could hope希望 to view视图,

32

123849

2783

能够完全浏览这些内容，

02:18

and you and I are contributing贡献
to that at this TEDTED.

33

126632

3921

而你我正参与其中的这场TED，
也为之添砖加瓦。

02:22

Yet然而 our most最 advanced高级 software软件
is still struggling奋斗的 at understanding理解

34

130553

5232

直到现在，我们最先进的
软件也依然为之犯难：

02:27

and managing管理的 this enormous巨大 content内容.

35

135785

3876

该怎么理解和处理
这些数量庞大的内容？

02:31

So in other words话,
collectively统 as a society社会,

36

139661

5272

所以换句话说，
在作为集体的这个社会里，

我们依然非常茫然，因为我们最智能的机器
依然有视觉上的缺陷。

02:36

we're very much blind盲,

37

144933

1746

02:38

because our smartest最聪明的
machines机 are still blind盲.

38

146679

3387

02:43

"Why is this so hard硬?" you may可能 ask问.

39

151526

2926

”为什么这么困难？“你也许会问。

照相机可以像这样获得照片：

02:46

Cameras相机 can take pictures图片 like this one

40

154452

2693

它把采集到的光线转换成
二维数字矩阵来存储

02:49

by converting转换 lights灯火 into
a two-dimensional二维 array排列 of numbers数字

41

157145

3994

——也就是“像素”，

02:53

known已知 as pixels像素,

42

161139

1650

但这些仍然是死板的数字。

02:54

but these are just lifeless死气沉沉 numbers数字.

43

162789

2251

02:57

They do not carry携带 meaning含义 in themselves他们自己.

44

165040

3111

它们自身并不携带任何意义。

03:00

Just like to hear听 is not
the same相同 as to listen,

45

168151

4343

就像”听到“和”听“完全不同，

03:04

to take pictures图片 is not
the same相同 as to see,

46

172494

4040

”拍照“和”看“也完全不同。

03:08

and by seeing眼看,
we really mean understanding理解.

47

176534

3829

通过“看”，
我们实际上是“理解”了这个画面。

03:13

In fact事实, it took拿 Mother母亲 Nature性质
540 million百万 years年份 of hard硬 work

48

181293

6177

事实上，大自然经过了5亿4千万年的努力

03:19

to do this task任务,

49

187470

1973

才完成了这个工作，

03:21

and much of that effort功夫

50

189443

1881

而这努力中更多的部分

03:23

went去 into developing发展 the visual视觉
processing处理 apparatus仪器 of our brains大脑,

51

191324

5271

是用在进化我们的大脑内
用于视觉处理的器官，

03:28

not the eyes眼睛 themselves他们自己.

52

196595

2647

而不是眼睛本身。

03:31

So vision视力 begins开始 with the eyes眼睛,

53

199242

2747

所以"视觉”从眼睛采集信息开始，

03:33

but it truly真 takes place地点 in the brain脑.

54

201989

3518

但大脑才是它真正呈现意义的地方。

03:38

So for 15 years年份 now, starting开始
from my Ph博士.D. at Caltech加州理工学院

55

206287

5060

所以15年来，
从我进入加州理工学院攻读Ph.D.

03:43

and then leading领导 Stanford's斯坦福大学的 Vision视力 Lab实验室,

56

211347

2926

到后来领导
斯坦福大学的视觉实验室，

03:46

I've been working加工 with my mentors导师,
collaborators合作者 and students学生们

57

214273

4396

我一直在和我的导师、
合作者和学生们一起

03:50

to teach教 computers电脑 to see.

58

218669

2889

教计算机如何去“看”。

我们的研究领域叫做
"计算机视觉与机器学习"。

03:54

Our research研究 field领域 is called叫
computer电脑 vision视力 and machine机 learning学习.

59

222658

3294

03:57

It's part部分 of the general一般 field领域
of artificial人造 intelligence情报.

60

225952

3878

这是AI（人工智能）领域的一个分支。

04:03

So ultimately最终, we want to teach教
the machines机 to see just like we do:

61

231000

5493

最终，我们希望能教会机器
像我们一样看见事物：

04:08

naming命名 objects对象, identifying识别 people,
inferring推断 3D geometry几何 of things,

62

236493

5387

识别物品、辨别不同的人、
推断物体的立体形状、

04:13

understanding理解 relations关系, emotions情绪,
actions行动 and intentions意图.

63

241880

5688

理解事物的关联、
人的情绪、动作和意图。

04:19

You and I weave编织 together一起 entire整个 stories故事
of people, places地方 and things

64

247568

6153

像你我一样，只凝视一个画面一眼
就能理清整个故事中的人物、地点、事件。

04:25

the moment时刻 we lay铺设 our gaze凝视 on them.

65

253721

2164

04:28

The first step步 towards向 this goal目标
is to teach教 a computer电脑 to see objects对象,

66

256955

5583

实现这一目标的第一步是
教计算机看到“对象”（物品），

04:34

the building建造 block块 of the visual视觉 world世界.

67

262538

3368

这是建造视觉世界的基石。

04:37

In its simplest简单 terms条款,
imagine想像 this teaching教学 process处理

68

265906

4434

在这个最简单的任务里，
想象一下这个教学过程：

04:42

as showing展示 the computers电脑
some training训练 images图片

69

270340

2995

给计算机看一些特定物品的训练图片，
比如说猫，

04:45

of a particular特定 object目的, let's say cats猫,

70

273335

3321

04:48

and designing设计 a model模型 that learns获悉
from these training训练 images图片.

71

276656

4737

并让它从这些训练图片中，
学习建立出一个模型来。

04:53

How hard硬 can this be?

72

281393

2044

这有多难呢？

不管怎么说，一只猫只是一些
形状和颜色拼凑起来的图案罢了，

04:55

After all, a cat猫 is just
a collection采集 of shapes形状 and colors颜色,

73

283437

4052

04:59

and this is what we did
in the early早 days天 of object目的 modeling造型.

74

287489

4086

比如这个就是我们
最初设计的抽象模型。

05:03

We'd星期三 tell the computer电脑 algorithm算法
in a mathematical数学的 language语言

75

291575

3622

我们用数学的语言，
告诉计算机这种算法：

05:07

that a cat猫 has a round回合 face面对,
a chubby丰满的 body身体,

76

295197

3343

“猫”有着圆脸、胖身子、

05:10

two pointy尖尖 ears耳朵, and a long tail尾巴,

77

298540

2299

两个尖尖的耳朵，还有一条长尾巴，

05:12

and that looked看着 all fine.

78

300839

1410

这（算法）看上去挺好的。

但如果遇到这样的猫呢？

05:14

But what about this cat猫?

79

302859

2113

（笑）

05:16

(Laughter笑声)

80

304972

1091

它整个蜷缩起来了。

05:18

It's all curled卷曲 up.

81

306063

1626

现在你不得不加入一些别的形状和视角
来描述这个物品模型。

05:19

Now you have to add加 another另一个 shape形状
and viewpoint观点 to the object目的 model模型.

82

307689

4719

05:24

But what if cats猫 are hidden隐?

83

312408

1715

但如果猫是藏起来的呢？

05:27

What about these silly愚蠢 cats猫?

84

315143

2219

再看看这些傻猫呢？

05:31

Now you get my point点.

85

319112

2417

你现在知道了吧。

05:33

Even something as simple简单
as a household家庭 pet宠物

86

321529

3367

即使那些事物简单到
只是一只家养的宠物，

05:36

can present当下 an infinite无穷 number数
of variations变化 to the object目的 model模型,

87

324896

4504

都可以出呈现出无限种变化的外观模型，

而这还只是“一个”对象的模型。

05:41

and that's just one object目的.

88

329400

2233

05:44

So about eight八 years年份 ago前,

89

332573

2492

所以大概在8年前，

05:47

a very simple简单 and profound深刻 observation意见
changed变 my thinking思维.

90

335065

5030

一个非常简单、有冲击力的
观察改变了我的想法。

05:53

No one tells告诉 a child儿童 how to see,

91

341425

2685

没有人教过婴儿怎么“看”，

05:56

especially特别 in the early早 years年份.

92

344110

2261

尤其是在他们还很小的时候。

05:58

They learn学习 this through通过
real-world真实世界 experiences经验 and examples例子.

93

346371

5000

他们是从真实世界的经验
和例子中学到这个的。

06:03

If you consider考虑 a child's孩子的 eyes眼睛

94

351371

2740

如果你把孩子的眼睛

都看作是生物照相机，

06:06

as a pair对 of biological生物 cameras相机,

95

354111

2554

06:08

they take one picture图片
about every一切 200 milliseconds毫秒,

96

356665

4180

那他们每200毫秒就拍一张照。

06:12

the average平均 time an eye眼 movement运动 is made制作.

97

360845

3134

——这是眼球转动一次的平均时间。

所以到3岁大的时候，一个孩子已经看过了
上亿张的真实世界照片。

06:15

So by age年龄 three三, a child儿童 would have seen看到
hundreds数以百计 of millions百万 of pictures图片

98

363979

5550

06:21

of the real真实 world世界.

99

369529

1834

这种“训练照片”的数量是非常大的。

06:23

That's a lot of training训练 examples例子.

100

371363

2280

所以，与其孤立地关注于
算法的优化、再优化，

06:26

So instead代替 of focusing调焦 solely独自
on better and better algorithms算法,

101

374383

5989

我的关注点放在了给算法
提供像那样的训练数据

06:32

my insight眼光 was to give the algorithms算法
the kind类 of training训练 data数据

102

380372

5272

06:37

that a child儿童 was given特定 through通过 experiences经验

103

385644

3319

——那些，婴儿们从经验中获得的
质量和数量都极其惊人的训练照片。

06:40

in both都 quantity数量 and quality质量.

104

388963

3878

06:44

Once一旦 we know this,

105

392841

1858

一旦我们知道了这个，

06:46

we knew知道 we needed需要 to collect搜集 a data数据 set组

106

394699

2971

我们就明白自己需要收集的数据集，

06:49

that has far远 more images图片
than we have ever had before,

107

397670

4459

必须比我们曾有过的任何数据库都丰富

06:54

perhaps也许 thousands数千 of times时 more,

108

402129

2577

——可能要丰富数千倍。

06:56

and together一起 with Professor教授
Kai凯 Li里 at Princeton普林斯顿 University大学,

109

404706

4111

因此，通过与普林斯顿大学的
Kai Li教授合作，

07:00

we launched推出 the ImageNetImageNet project项目 in 2007.

110

408817

4752

我们在2007年发起了
ImageNet（图片网络）计划。

07:05

Luckily幸, we didn't have to mount安装
a camera相机 on our head头

111

413569

3838

幸运的是，我们不必在自己脑子里
装上一台照相机，然后等它拍很多年。

07:09

and wait for many许多 years年份.

112

417407

1764

我们运用了互联网，

07:11

We went去 to the Internet互联网,

113

419171

1463

07:12

the biggest最大 treasure宝藏 trove宝库 of pictures图片
that humans人类 have ever created创建.

114

420634

4436

这个由人类创造的
最大的图片宝库。

07:17

We downloaded下载 nearly几乎 a billion十亿 images图片

115

425070

3041

我们下载了接近10亿张图片

07:20

and used crowdsourcing众包 technology技术
like the Amazon亚马逊 Mechanical机械 Turk土耳其人 platform平台

116

428111

5880

并利用众包技术（利用互联网分配工作、发现创意或
解决技术问题），像“亚马逊土耳其机器人”这样的平台

07:25

to help us to label标签 these images图片.

117

433991

2339

来帮我们标记这些图片。

07:28

At its peak峰, ImageNetImageNet was one of
the biggest最大 employers雇主

118

436330

4900

在高峰期时，ImageNet是「亚马逊土耳其机器人」
这个平台上最大的雇主之一：

07:33

of the Amazon亚马逊 Mechanical机械 Turk土耳其人 workers工人:

119

441230

2996

来自世界上167个国家的
接近5万个工作者，在一起工作

07:36

together一起, almost几乎 50,000 workers工人

120

444226

3854

07:40

from 167 countries国家 around the world世界

121

448080

4040

帮我们筛选、排序、标记了
接近10亿张备选照片。

07:44

helped帮助 us to clean清洁, sort分类 and label标签

122

452120

3947

07:48

nearly几乎 a billion十亿 candidate候选人 images图片.

123

456067

3575

这就是我们为这个计划投入的精力，

07:52

That was how much effort功夫 it took拿

124

460612

2653

07:55

to capture捕获 even a fraction分数
of the imagery意象

125

463265

3900

去捕捉，一个婴儿可能在他早期发育阶段
获取的”一小部分“图像。

07:59

a child's孩子的 mind心神 takes in
in the early早 developmental发展的 years年份.

126

467165

4171

事后我们再来看，这个利用大数据来训练
计算机算法的思路，也许现在看起来很普通，

08:04

In hindsight事后, this idea理念 of using运用 big大 data数据

127

472148

3902

08:08

to train培养 computer电脑 algorithms算法
may可能 seem似乎 obvious明显 now,

128

476050

4550

但回到2007年时，它就不那么寻常了。

08:12

but back in 2007, it was not so obvious明显.

129

480600

4110

我们在这段旅程上孤独地前行了很久。

08:16

We were fairly相当 alone单独 on this journey旅程
for quite相当 a while.

130

484710

3878

一些很友善的同事建议我
做一些更有用的事来获得终身教职，

08:20

Some very friendly友善 colleagues同事 advised建议 me
to do something more useful有用 for my tenure保有,

131

488588

5003

08:25

and we were constantly经常 struggling奋斗的
for research研究 funding资金.

132

493591

4342

而且我们也不断地为项目的研究经费发愁。

08:29

Once一旦, I even joked开玩笑 to my graduate毕业 students学生们

133

497933

2485

有一次，我甚至对
我的研究生学生开玩笑说：

08:32

that I would just reopen重开
my dry干 cleaner's清洁的 shop店 to fund基金 ImageNetImageNet.

134

500418

4063

我要重新回去开我的干洗店
来赚钱资助ImageNet了。

08:36

After all, that's how I funded资助
my college学院 years年份.

135

504481

4761

——毕竟，我的大学时光
就是靠这个资助的。

所以我们仍然在继续着。

08:41

So we carried携带的 on.

136

509242

1856

在2009年，ImageNet项目诞生了——

08:43

In 2009, the ImageNetImageNet project项目 delivered交付

137

511098

3715

08:46

a database数据库 of 15 million百万 images图片

138

514813

4042

一个含有1500万张照片的数据库，
涵盖了22000种物品。

08:50

across横过 22,000 classes类
of objects对象 and things

139

518855

4805

这些物品是根据日常英语单词
进行分类组织的。

08:55

organized有组织的 by everyday每天 English英语 words话.

140

523660

3320

无论是在质量上还是数量上，

08:58

In both都 quantity数量 and quality质量,

141

526980

2926

这都是一个规模空前的数据库。

09:01

this was an unprecedented史无前例 scale规模.

142

529906

2972

09:04

As an example例, in the case案件 of cats猫,

143

532878

3461

举个例子，在"猫"这个对象中，

09:08

we have more than 62,000 cats猫

144

536339

2809

我们有超过62000只猫

09:11

of all kinds种 of looks容貌 and poses姿势

145

539148

4110

长相各异，姿势五花八门，

09:15

and across横过 all species种类
of domestic国内 and wild野生 cats猫.

146

543258

5223

而且涵盖了各种品种的家猫和野猫。

我们对ImageNet收集到的图片
感到异常兴奋，

09:20

We were thrilled高兴
to have put together一起 ImageNetImageNet,

147

548481

3344

09:23

and we wanted the whole整个 research研究 world世界
to benefit效益 from it,

148

551825

3738

而且我们希望整个研究界能从中受益，

所以以一种和TED一样的方式，

09:27

so in the TEDTED fashion时尚,
we opened打开 up the entire整个 data数据 set组

149

555563

4041

我们公开了整个数据库，
免费提供给全世界的研究团体。

09:31

to the worldwide全世界
research研究 community社区 for free自由.

150

559604

3592

（掌声）

09:36

(Applause掌声)

151

564636

4000

那么现在，我们有了用来
培育计算机大脑的数据库，

09:41

Now that we have the data数据
to nourish滋养 our computer电脑 brain脑,

152

569416

4538

我们可以回到”算法“本身上来了。

09:45

we're ready准备 to come back
to the algorithms算法 themselves他们自己.

153

573954

3737

因为ImageNet的横空出世，它提供的信息财富
完美地适用于一些特定类别的机器学习算法，

09:49

As it turned转身 out, the wealth财富
of information信息 provided提供 by ImageNetImageNet

154

577691

5178

09:54

was a perfect完善 match比赛 to a particular特定 class类
of machine机 learning学习 algorithms算法

155

582869

4806

称作“卷积神经网络”，

09:59

called叫 convolutional卷积 neural神经 network网络,

156

587675

2415

10:02

pioneered首创 by Kunihiko邦彦 Fukushima福岛,
Geoff杰夫 Hinton韩丁, and Yann晏 LeCunLeCun

157

590090

5248

最早由Kunihiko Fukushima，Geoff Hinton，
和Yann LeCun在上世纪七八十年代开创。

10:07

back in the 1970s and '80s.

158

595338

3645

就像大脑是由上十亿的
紧密联结的神经元组成，

10:10

Just like the brain脑 consists由
of billions数十亿 of highly高度 connected连接的 neurons神经元,

159

598983

5619

10:16

a basic基本 operating操作 unit单元 in a neural神经 network网络

160

604602

3854

神经网络里最基础的运算单元
也是一个“神经元式”的节点。

10:20

is a neuron-like神经元样 node节点.

161

608456

2415

10:22

It takes input输入 from other nodes节点

162

610871

2554

每个节点从其它节点处获取输入信息，
然后把自己的输出信息再交给另外的节点。

10:25

and sends发送 output产量 to others其他.

163

613425

2718

此外，这些成千上万、甚至上百万的节点

10:28

Moreover此外, these hundreds数以百计 of thousands数千
or even millions百万 of nodes节点

164

616143

4713

10:32

are organized有组织的 in hierarchical分级 layers层,

165

620856

3227

都被按等级分布于不同层次，

就像大脑一样。

10:36

also也 similar类似 to the brain脑.

166

624083

2554

在一个我们用来训练“对象识别模型”的
典型神经网络里，

10:38

In a typical典型 neural神经 network网络 we use
to train培养 our object目的 recognition承认 model模型,

167

626637

4783

10:43

it has 24 million百万 nodes节点,

168

631420

3181

有着2400万个节点，1亿4千万个参数，
和150亿个联结。

10:46

140 million百万 parameters参数,

169

634601

3297

10:49

and 15 billion十亿 connections连接.

170

637898

2763

这是一个庞大的模型。

10:52

That's an enormous巨大 model模型.

171

640661

2415

10:55

Powered供电 by the massive大规模的 data数据 from ImageNetImageNet

172

643076

3901

借助ImageNet提供的巨大规模数据支持，

通过大量最先进的CPU和GPU，
来训练这些堆积如山的模型，

10:58

and the modern现代 CPUsCPU的 and GPUs图形处理器
to train培养 such这样 a humongous堆积如山 model模型,

173

646977

5433

“卷积神经网络”
以难以想象的方式蓬勃发展起来。

11:04

the convolutional卷积 neural神经 network网络

174

652410

2369

11:06

blossomed开花 in a way that no one expected预期.

175

654779

3436

它成为了一个成功体系，

11:10

It became成为 the winning胜利 architecture建筑

176

658215

2508

11:12

to generate生成 exciting扣人心弦 new新 results结果
in object目的 recognition承认.

177

660723

5340

在对象识别领域，
产生了激动人心的新成果。

11:18

This is a computer电脑 telling告诉 us

178

666063

2810

这张图，是计算机在告诉我们：

11:20

this picture图片 contains包含 a cat猫

179

668873

2300

照片里有一只猫、

11:23

and where the cat猫 is.

180

671173

1903

还有猫所在的位置。

当然不止有猫了，

11:25

Of course课程 there are more things than cats猫,

181

673076

2112

所以这是计算机算法在告诉我们

11:27

so here's这里的 a computer电脑 algorithm算法 telling告诉 us

182

675188

2438

照片里有一个男孩，和一个泰迪熊；

11:29

the picture图片 contains包含
a boy男孩 and a teddy泰迪熊 bear熊;

183

677626

3274

一只狗，一个人，和背景里的小风筝；

11:32

a dog狗, a person人, and a small小 kite风筝
in the background背景;

184

680900

4366

或者是一张拍摄于闹市的照片
比如人、滑板、栏杆、灯柱…等等。

11:37

or a picture图片 of very busy忙 things

185

685266

3135

11:40

like a man, a skateboard滑板,
railings栏杆, a lampostlampost, and so on.

186

688401

4644

有时候，如果计算机
不是很确定它看到的是什么，

11:45

Sometimes有时, when the computer电脑
is not so confident信心 about what it sees看到,

187

693045

5293

11:51

we have taught教 it to be smart聪明 enough足够

188

699498

2276

我们还教它用足够聪明的方式
给出一个“安全”的答案，而不是“言多必失”

11:53

to give us a safe安全 answer回答
instead代替 of committing提交 too much,

189

701774

3878

——就像人类面对这类问题时一样。

11:57

just like we would do,

190

705652

2811

但在其他时候，我们的计算机
算法厉害到可以告诉我们

12:00

but other times时 our computer电脑 algorithm算法
is remarkable卓越 at telling告诉 us

191

708463

4666

12:05

what exactly究竟 the objects对象 are,

192

713129

2253

关于对象的更确切的信息，
比如汽车的品牌、型号、年份。

12:07

like the make, model模型, year年 of the cars汽车.

193

715382

3436

12:10

We applied应用的 this algorithm算法 to millions百万
of Google谷歌 Street街 View视图 images图片

194

718818

5386

我们在上百万张谷歌街景照片中
应用了这一算法，

12:16

across横过 hundreds数以百计 of American美国 cities城市,

195

724204

3135

那些照片涵盖了上百个美国城市。

我们从中发现一些有趣的事：

12:19

and we have learned学到了 something
really interesting有趣:

196

727339

2926

首先，它证实了我们的一些常识：

12:22

first, it confirmed确认 our common共同 wisdom智慧

197

730265

3320

12:25

that car汽车 prices价格 correlate关联 very well

198

733585

3290

汽车的价格，与家庭收入
呈现出明显的正相关。

12:28

with household家庭 incomes收入.

199

736875

2345

但令人惊奇的是，汽车价格与犯罪率
也呈现出明显的正相关性，

12:31

But surprisingly出奇, car汽车 prices价格
also也 correlate关联 well

200

739220

4527

12:35

with crime犯罪 rates利率 in cities城市,

201

743747

2300

以上结论是基于城市、或投票的
邮编区域进行分析的结果。

12:39

or voting表决 patterns模式 by zip压缩 codes代码.

202

747007

3963

那么等一下，这就是全部成果了吗？

12:44

So wait a minute分钟. Is that it?

203

752060

2206

计算机是不是已经达到，
或者甚至超过了人类的能力？

12:46

Has the computer电脑 already已经 matched匹配
or even surpassed超越 human人的 capabilities功能?

204

754266

5153

12:51

Not so fast快速.

205

759419

2138

——还没有那么快。

目前为止，我们还只是
教会了计算机去看对象。

12:53

So far远, we have just taught教
the computer电脑 to see objects对象.

206

761557

4923

12:58

This is like a small小 child儿童
learning学习 to utter说出 a few少数 nouns名词.

207

766480

4644

这就像是一个小宝宝学会说出几个名词。

这是一项难以置信的成就，

13:03

It's an incredible难以置信 accomplishment成就,

208

771124

2670

但这还只是第一步。

13:05

but it's only the first step步.

209

773794

2460

很快，我们就会到达
发展历程的另一个里程碑：

13:08

Soon不久, another另一个 developmental发展的
milestone里程碑 will be hit击中,

210

776254

3762

这个小孩会开始用“句子”进行交流。

13:12

and children孩子 begin开始
to communicate通信 in sentences句子.

211

780016

3461

13:15

So instead代替 of saying话
this is a cat猫 in the picture图片,

212

783477

4224

所以不止是说这张图里有只“猫”，

13:19

you already已经 heard听说 the little girl女孩
telling告诉 us this is a cat猫 lying说谎 on a bed床.

213

787701

5202

你在开头已经听到小妹妹
告诉我们“这只猫是坐在床上的”。

13:24

So to teach教 a computer电脑
to see a picture图片 and generate生成 sentences句子,

214

792903

5595

为了教计算机看懂图片并生成句子，

13:30

the marriage婚姻 between之间 big大 data数据
and machine机 learning学习 algorithm算法

215

798498

3948

“大数据”和“机器学习算法”的结合
需要更进一步。

13:34

has to take another另一个 step步.

216

802446

2275

现在，计算机需要从图片和人类创造的
自然语言句子中同时进行学习。

13:36

Now, the computer电脑 has to learn学习
from both都 pictures图片

217

804721

4156

13:40

as well as natural自然 language语言 sentences句子

218

808877

2856

13:43

generated产生 by humans人类.

219

811733

3322

就像我们的大脑，
把视觉现象和语言融合在一起，

13:47

Just like the brain脑 integrates整合
vision视力 and language语言,

220

815055

3853

13:50

we developed发达 a model模型
that connects所连接 parts部分 of visual视觉 things

221

818908

5201

我们开发了一个模型，

可以把一部分视觉信息，像视觉片段，
与语句中的文字、短语联系起来。

13:56

like visual视觉 snippets片段

222

824109

1904

13:58

with words话 and phrases短语 in sentences句子.

223

826013

4203

14:02

About four四 months个月 ago前,

224

830216

2763

大约4个月前，
我们最终把所有技术结合在了一起，

14:04

we finally最后 tied绑 all this together一起

225

832979

2647

创造了第一个“计算机视觉模型”，

14:07

and produced生成 one of the first
computer电脑 vision视力 models楷模

226

835626

3784

14:11

that is capable能 of generating发电
a human-like类人 sentence句子

227

839410

3994

它在看到图片的第一时间，就有能力生成
类似人类语言的句子。

14:15

when it sees看到 a picture图片 for the first time.

228

843404

3506

现在，我准备给你们看看
计算机看到图片时会说些什么

14:18

Now, I'm ready准备 to show显示 you
what the computer电脑 says说

229

846910

4644

14:23

when it sees看到 the picture图片

230

851554

1975

——还是那些在演讲开头给小女孩看的图片。

14:25

that the little girl女孩 saw
at the beginning开始 of this talk.

231

853529

3830

（视频）计算机：
“一个男人站在一头大象旁边。”

14:31

(Video视频) Computer电脑: A man is standing常设
next下一个 to an elephant象.

232

859519

3344

“一架大飞机停在机场跑道一端。”

14:36

A large大 airplane飞机 sitting坐在 on top最佳
of an airport飞机场 runway跑道.

233

864393

3634

14:41

FFLFFL: Of course课程, we're still working加工 hard硬
to improve提高 our algorithms算法,

234

869057

4212

李飞飞：
当然，我们还在努力改善我们的算法，

14:45

and it still has a lot to learn学习.

235

873269

2596

它还有很多要学的东西。

（掌声）

14:47

(Applause掌声)

236

875865

2291

计算机还是会犯很多错误的。

14:51

And the computer电脑 still makes品牌 mistakes错误.

237

879556

3321

（视频）计算机：
“一只猫躺在床上的毯子上。”

14:54

(Video视频) Computer电脑: A cat猫 lying说谎
on a bed床 in a blanket毯.

238

882877

3391

李飞飞：所以…当然——如果它看过太多种的猫，
它就会觉得什么东西都长得像猫……

14:58

FFLFFL: So of course课程, when it sees看到
too many许多 cats猫,

239

886268

2553

15:00

it thinks想 everything
might威力 look like a cat猫.

240

888821

2926

（视频）计算机：
“一个小男孩拿着一根棒球棍。”

15:05

(Video视频) Computer电脑: A young年轻 boy男孩
is holding保持 a baseball棒球 bat蝙蝠.

241

893317

2864

15:08

(Laughter笑声)

242

896181

1765

（笑声）

李飞飞：或者…如果它从没见过牙刷，
它就分不清牙刷和棒球棍的区别。

15:09

FFLFFL: Or, if it hasn't有没有 seen看到 a toothbrush牙刷,
it confuses混淆 it with a baseball棒球 bat蝙蝠.

243

897946

4583

（视频）计算机：
“建筑旁的街道上有一个男人骑马经过。”

15:15

(Video视频) Computer电脑: A man riding骑术 a horse马
down a street街 next下一个 to a building建造.

244

903309

3434

15:18

(Laughter笑声)

245

906743

2023

（笑声）

李飞飞：我们还没教它Art 101
（美国大学艺术基础课）。

15:20

FFLFFL: We haven't没有 taught教 Art艺术 101
to the computers电脑.

246

908766

3552

（视频）计算机：
“一只斑马站在一片草原上。”

15:25

(Video视频) Computer电脑: A zebra斑马 standing常设
in a field领域 of grass草.

247

913768

2884

李飞飞：它还没学会像你我一样
欣赏大自然里的绝美景色。

15:28

FFLFFL: And it hasn't有没有 learned学到了 to appreciate欣赏
the stunning令人惊叹 beauty美女 of nature性质

248

916652

3367

15:32

like you and I do.

249

920019

2438

所以，这是一条漫长的道路。

15:34

So it has been a long journey旅程.

250

922457

2832

将一个孩子从出生培养到3岁是很辛苦的。

15:37

To get from age年龄 zero零 to three三 was hard硬.

251

925289

4226

而真正的挑战是从3岁到13岁的过程中，
而且远远不止于此。

15:41

The real真实 challenge挑战 is to go
from three三 to 13 and far远 beyond外.

252

929515

5596

让我再给你们看看这张
关于小男孩和蛋糕的图。

15:47

Let me remind提醒 you with this picture图片
of the boy男孩 and the cake蛋糕 again.

253

935111

4365

目前为止，
我们已经教会计算机“看”对象，

15:51

So far远, we have taught教
the computer电脑 to see objects对象

254

939476

4064

15:55

or even tell us a simple简单 story故事
when seeing眼看 a picture图片.

255

943540

4458

或者甚至基于图片，
告诉我们一个简单的故事。

（视频）计算机：
”一个人坐在放蛋糕的桌子旁。“

15:59

(Video视频) Computer电脑: A person人 sitting坐在
at a table表 with a cake蛋糕.

256

947998

3576

李飞飞：但图片里还有更多信息
——远不止一个人和一个蛋糕。

16:03

FFLFFL: But there's so much more
to this picture图片

257

951574

2630

16:06

than just a person人 and a cake蛋糕.

258

954204

2270

计算机无法理解的是：
这是一个特殊的意大利蛋糕，

16:08

What the computer电脑 doesn't see
is that this is a special特别 Italian意大利 cake蛋糕

259

956474

4467

它只在复活节限时供应。

16:12

that's only served提供服务 during中 Easter复活节 time.

260

960941

3217

而这个男孩穿着的
是他最喜欢的T恤衫，

16:16

The boy男孩 is wearing穿着 his favorite喜爱 t-shirtT恤衫

261

964158

3205

那是他父亲去悉尼旅行时
带给他的礼物。

16:19

given特定 to him as a gift礼品 by his father父亲
after a trip旅 to Sydney悉尼,

262

967363

3970

另外，你和我都能清楚地看出，
这个小孩有多高兴，以及这一刻在想什么。

16:23

and you and I can all tell how happy快乐 he is

263

971333

3808

16:27

and what's exactly究竟 on his mind心神
at that moment时刻.

264

975141

3203

这是我的儿子Leo。

16:31

This is my son儿子 Leo狮子座.

265

979214

3125

在我探索视觉智能的道路上，

16:34

On my quest寻求 for visual视觉 intelligence情报,

266

982339

2624

我不断地想到Leo
和他未来将要生活的那个世界。

16:36

I think of Leo狮子座 constantly经常

267

984963

2391

16:39

and the future未来 world世界 he will live生活 in.

268

987354

2903

当机器可以“看到”的时候，

16:42

When machines机 can see,

269

990257

2021

医生和护士会获得一双额外的、
不知疲倦的眼睛，

16:44

doctors医生 and nurses护士 will have
extra额外 pairs对 of tireless不知疲倦的 eyes眼睛

270

992278

4712

帮他们诊断病情、照顾病人。

16:48

to help them to diagnose诊断
and take care关心 of patients耐心.

271

996990

4092

16:53

Cars汽车 will run跑 smarter聪明
and safer更安全 on the road路.

272

1001082

4383

汽车可以在道路上行驶得
更智能、更安全。

机器人，而不只是人类，

16:57

Robots机器人, not just humans人类,

273

1005465

2694

会帮我们救助灾区被困和受伤的人员。

17:00

will help us to brave勇敢 the disaster灾害 zones区
to save保存 the trapped被困 and wounded负伤.

274

1008159

4849

我们会发现新的物种、更好的材料，

17:05

We will discover发现 new新 species种类,
better materials物料,

275

1013798

3796

还可以在机器的帮助下
探索从未见到过的前沿地带。

17:09

and explore探索 unseen看不见 frontiers前沿
with the help of the machines机.

276

1017594

4509

一点一点地，
我们正在赋予机器以视力。

17:15

Little by little, we're giving给 sight视力
to the machines机.

277

1023113

4167

首先，我们教它们去“看”。

17:19

First, we teach教 them to see.

278

1027280

2798

然后，它们反过来也帮助我们，
让我们看得更清楚。

17:22

Then, they help us to see better.

279

1030078

2763

17:24

For the first time, human人的 eyes眼睛
won't惯于 be the only ones那些

280

1032841

4165

这是第一次，人类的眼睛不再
独自地思考和探索我们的世界。

17:29

pondering琢磨 and exploring探索 our world世界.

281

1037006

2934

我们将不止是“使用”机器的智力，

17:31

We will not only use the machines机
for their其 intelligence情报,

282

1039940

3460

我们还要以一种从未想象过的方式，
与它们“合作”。

17:35

we will also也 collaborate合作 with them
in ways方法 that we cannot不能 even imagine想像.

283

1043400

6179

我所追求的是：

17:41

This is my quest寻求:

284

1049579

2161

赋予计算机视觉智能，

17:43

to give computers电脑 visual视觉 intelligence情报

285

1051740

2712

并为Leo和这个世界，
创造出更美好的未来。

17:46

and to create创建 a better future未来
for Leo狮子座 and for the world世界.

286

1054452

5131

谢谢。

17:51

Thank you.

287

1059583

1811

（掌声）

17:53

(Applause掌声)

288

1061394

3785

Translated by Twisted Meadows
Reviewed by Min Wang

ABOUT THE SPEAKER

Fei-Fei Li - Computer scientist
As Director of Stanford’s Artificial Intelligence Lab and Vision Lab, Fei-Fei Li is working to solve AI’s trickiest problems -- including image recognition, learning and language processing.

Why you should listen

Using algorithms built on machine learning methods such as neural network models, the Stanford Artificial Intelligence Lab led by Fei-Fei Li has created software capable of recognizing scenes in still photographs -- and accurately describe them using natural language.

Li’s work with neural networks and computer vision (with Stanford’s Vision Lab) marks a significant step forward for AI research, and could lead to applications ranging from more intuitive image searches to robots able to make autonomous decisions in unfamiliar situations.

Fei-Fei was honored as one of Foreign Policy's 2015 Global Thinkers.

More profile about the speaker
Fei-Fei Li | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

李飞飞: 我们怎么教计算机理解图片？ | TED Talk | TED.com