Fei-Fei Li: How we're teaching computers to understand pictures
李飞飞: 我们怎么教计算机理解图片?
As Director of Stanford’s Artificial Intelligence Lab and Vision Lab, Fei-Fei Li is working to solve AI’s trickiest problems -- including image recognition, learning and language processing. Full bio
Double-click the English transcript below to play the video.
好吧,这是只猫,坐在床上。
sitting in a bed.
that are going on an airplane.
这是一个三岁的小孩
a three-year-old child
in a series of photos.
to learn about this world,
她也许还有很多要学的东西,
at one very important task:
她已经是专家了:
取得了前所未有的进步。
technologically advanced than ever.
we make phones that talk to us
我们制造出可以与我们对话的手机,
that can play only music we like.
播放的全是我们喜欢的音乐。
machines and computers
to give you a progress report
向大家做个进度汇报:
in our research in computer vision,
视觉方面最新的研究进展。
and potentially revolutionary
that can drive by themselves,
具备自动驾驶功能的原型车,
they cannot really tell the difference
它们就不能真正区分出
on the road, which can be run over,
可以被轻易压过,
which should be avoided.
应该避开。
sight to the blind.
这些画面传递给盲人。
the changes of the rainforests.
is drowning in a swimming pool.
它们无法向我们报警。
an integral part of global life.
全人类生活里不可缺少的部分。
that's far beyond what any human,
以至于没有任何人,或者团体,
to that at this TED.
也为之添砖加瓦。
is still struggling at understanding
软件也依然为之犯难:
这些数量庞大的内容?
collectively as a society,
在作为集体的这个社会里,
依然有视觉上的缺陷。
machines are still blind.
二维数字矩阵来存储
a two-dimensional array of numbers
the same as to listen,
the same as to see,
we really mean understanding.
我们实际上是“理解”了这个画面。
540 million years of hard work
processing apparatus of our brains,
用于视觉处理的器官,
from my Ph.D. at Caltech
从我进入加州理工学院攻读Ph.D.
斯坦福大学的视觉实验室,
collaborators and students
合作者和学生们一起
"计算机视觉与机器学习"。
computer vision and machine learning.
of artificial intelligence.
the machines to see just like we do:
像我们一样看见事物:
inferring 3D geometry of things,
推断物体的立体形状、
actions and intentions.
人的情绪、动作和意图。
of people, places and things
就能理清整个故事中的人物、地点、事件。
is to teach a computer to see objects,
教计算机看到“对象”(物品),
imagine this teaching process
想象一下这个教学过程:
some training images
比如说猫,
from these training images.
学习建立出一个模型来。
形状和颜色拼凑起来的图案罢了,
a collection of shapes and colors,
in the early days of object modeling.
最初设计的抽象模型。
in a mathematical language
告诉计算机这种算法:
a chubby body,
来描述这个物品模型。
and viewpoint to the object model.
as a household pet
只是一只家养的宠物,
of variations to the object model,
changed my thinking.
观察改变了我的想法。
real-world experiences and examples.
和例子中学到这个的。
about every 200 milliseconds,
上亿张的真实世界照片。
hundreds of millions of pictures
算法的优化、再优化,
on better and better algorithms,
提供像那样的训练数据
the kind of training data
质量和数量都极其惊人的训练照片。
than we have ever had before,
Kai Li at Princeton University,
Kai Li教授合作,
ImageNet(图片网络)计划。
a camera on our head
装上一台照相机,然后等它拍很多年。
that humans have ever created.
最大的图片宝库。
like the Amazon Mechanical Turk platform
解决技术问题),像“亚马逊土耳其机器人”这样的平台
the biggest employers
这个平台上最大的雇主之一:
接近5万个工作者,在一起工作
接近10亿张备选照片。
of the imagery
获取的”一小部分“图像。
in the early developmental years.
计算机算法的思路,也许现在看起来很普通,
may seem obvious now,
for quite a while.
做一些更有用的事来获得终身教职,
to do something more useful for my tenure,
for research funding.
我的研究生学生开玩笑说:
my dry cleaner's shop to fund ImageNet.
来赚钱资助ImageNet了。
my college years.
就是靠这个资助的。
涵盖了22000种物品。
of objects and things
进行分类组织的。
of domestic and wild cats.
感到异常兴奋,
to have put together ImageNet,
to benefit from it,
we opened up the entire data set
免费提供给全世界的研究团体。
research community for free.
培育计算机大脑的数据库,
to nourish our computer brain,
to the algorithms themselves.
完美地适用于一些特定类别的机器学习算法,
of information provided by ImageNet
of machine learning algorithms
Geoff Hinton, and Yann LeCun
和Yann LeCun在上世纪七八十年代开创。
紧密联结的神经元组成,
of billions of highly connected neurons,
也是一个“神经元式”的节点。
然后把自己的输出信息再交给另外的节点。
or even millions of nodes
典型神经网络里,
to train our object recognition model,
和150亿个联结。
来训练这些堆积如山的模型,
to train such a humongous model,
以难以想象的方式蓬勃发展起来。
in object recognition.
产生了激动人心的新成果。
a boy and a teddy bear;
in the background;
比如人、滑板、栏杆、灯柱…等等。
railings, a lampost, and so on.
不是很确定它看到的是什么,
is not so confident about what it sees,
给出一个“安全”的答案,而不是“言多必失”
instead of committing too much,
算法厉害到可以告诉我们
is remarkable at telling us
比如汽车的品牌、型号、年份。
of Google Street View images
应用了这一算法,
really interesting:
呈现出明显的正相关。
也呈现出明显的正相关性,
also correlate well
邮编区域进行分析的结果。
或者甚至超过了人类的能力?
or even surpassed human capabilities?
教会了计算机去看对象。
the computer to see objects.
learning to utter a few nouns.
发展历程的另一个里程碑:
milestone will be hit,
to communicate in sentences.
this is a cat in the picture,
telling us this is a cat lying on a bed.
告诉我们“这只猫是坐在床上的”。
to see a picture and generate sentences,
and machine learning algorithm
需要更进一步。
自然语言句子中同时进行学习。
from both pictures
把视觉现象和语言融合在一起,
vision and language,
that connects parts of visual things
与语句中的文字、短语联系起来。
我们最终把所有技术结合在了一起,
computer vision models
a human-like sentence
类似人类语言的句子。
计算机看到图片时会说些什么
what the computer says
at the beginning of this talk.
“一个男人站在一头大象旁边。”
next to an elephant.
of an airport runway.
to improve our algorithms,
当然,我们还在努力改善我们的算法,
“一只猫躺在床上的毯子上。”
on a bed in a blanket.
它就会觉得什么东西都长得像猫……
too many cats,
might look like a cat.
“一个小男孩拿着一根棒球棍。”
is holding a baseball bat.
它就分不清牙刷和棒球棍的区别。
it confuses it with a baseball bat.
“建筑旁的街道上有一个男人骑马经过。”
down a street next to a building.
(美国大学艺术基础课)。
to the computers.
“一只斑马站在一片草原上。”
in a field of grass.
欣赏大自然里的绝美景色。
the stunning beauty of nature
而且远远不止于此。
from three to 13 and far beyond.
关于小男孩和蛋糕的图。
of the boy and the cake again.
我们已经教会计算机“看”对象,
the computer to see objects
when seeing a picture.
告诉我们一个简单的故事。
”一个人坐在放蛋糕的桌子旁。“
at a table with a cake.
——远不止一个人和一个蛋糕。
to this picture
这是一个特殊的意大利蛋糕,
is that this is a special Italian cake
是他最喜欢的T恤衫,
带给他的礼物。
after a trip to Sydney,
这个小孩有多高兴,以及这一刻在想什么。
at that moment.
和他未来将要生活的那个世界。
不知疲倦的眼睛,
extra pairs of tireless eyes
and take care of patients.
and safer on the road.
更智能、更安全。
to save the trapped and wounded.
better materials,
探索从未见到过的前沿地带。
with the help of the machines.
我们正在赋予机器以视力。
to the machines.
让我们看得更清楚。
won't be the only ones
独自地思考和探索我们的世界。
for their intelligence,
与它们“合作”。
in ways that we cannot even imagine.
创造出更美好的未来。
for Leo and for the world.
ABOUT THE SPEAKER
Fei-Fei Li - Computer scientistAs Director of Stanford’s Artificial Intelligence Lab and Vision Lab, Fei-Fei Li is working to solve AI’s trickiest problems -- including image recognition, learning and language processing.
Why you should listen
Using algorithms built on machine learning methods such as neural network models, the Stanford Artificial Intelligence Lab led by Fei-Fei Li has created software capable of recognizing scenes in still photographs -- and accurately describe them using natural language.
Li’s work with neural networks and computer vision (with Stanford’s Vision Lab) marks a significant step forward for AI research, and could lead to applications ranging from more intuitive image searches to robots able to make autonomous decisions in unfamiliar situations.
Fei-Fei was honored as one of Foreign Policy's 2015 Global Thinkers.
Fei-Fei Li | Speaker | TED.com