Blaise Agüera y Arcas: How computers are learning to be creative
布雷斯.阿規耶拉.雅克斯: 電腦是如何學會創意的?
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces. Full bio
Double-click the English transcript below to play the video.
that works on machine intelligence;
一個團隊做機械智慧;
of making computers and devices
that brains do.
interested in real brains
in the things that our brains do
to the performance of computers.
has been perception,
其中一個領域談的就是機械感知,
out there in the world --
for example, that our team makes,
像是我們團隊做的,
on Google Photos to become searchable,
把它們變成可以被搜尋的資料。
out there into the world.
our work on machine perception
我們團隊在機器感知上的努力,
with the world of machine creativity
had a penetrating insight
與「創意」這兩者之間的關係
between perception and creativity.
has a statue inside of it,
is to discover it."
Michelangelo was getting at
當時的體悟是:
is an act of imagination
and perceiving and imagining,
能做出思考、感受和想像,
with a brief bit of history
the heart or the intestines,
about a brain by just looking at it,
of this thing all kinds of fanciful names,
取了許多稀奇古怪的名字,
doesn't tell us very much
大腦的認識有太多的幫助。
developed some kind of insight
Santiago Ramón y Cajal,
桑地牙哥·拉蒙卡哈,
or render in very high contrast
their morphologies.
它們的形態結構。
that he made of neurons
of different sorts of cells,
各式各樣不同的細胞圖片,
was quite new at this point.
在當時是個相當新穎的概念。
very, very long distances --
這樣的發現算是相當神奇了。
to some people in the 19th century;
這樣的比喻可能比較恰當,
were just getting underway.
正如火如荼的進行。
of Ramón y Cajal's, like this one,
that Ramón y Cajal started.
當年拉蒙卡哈的研究。
of Neuroscience.
神經科學機構的合作夥伴。
一小片一小片的圖像。
is about one cubic millimeter in size,
大約只有 1 立方毫米,
very small piece of it here.
左邊的長度標誌僅有一微米。
tiny block of tissue.
一片片切出來的。
of hair is about 100 microns.
大約有 100 微米。
much, much smaller
electron microscopy slices,
in 3D of neurons that look like these.
神經元 3D 立體成像。
style as Ramón y Cajal.
當年的研究相去不遠。
be able to see anything here.
one neuron to another.
ahead of his time,
也算是走在時代的尖端,
over the next few decades.
神經元是利用電子傳遞訊號,
was advanced enough
我們的科技已經進步到
experiments on live neurons
when computers were being invented,
of modeling the brain --
as Alan Turing called it,
他稱之為「智能機械」,
looked at Ramón y Cajal's drawing
(人工神經科學家)
imagery that comes from the eye.
眼睛傳來的訊號轉換成圖像。
like a circuit diagram.
它看起來像是一張電路圖。
in McCulloch and Pitts's circuit diagram
of computational elements
one to the next in a cascade,
在串聯的電路圖上傳遞著資訊,
visual information would need to do.
需要做哪些事情。
for us to do with our brains.
that for a computer,
just a few years ago.
this task is easy to do.
and the word "bird,"
connected to each other
inside our visual cortices,
就像我們的視覺皮質運作原理。
to have the capability
on the computer.
that actually looks like.
實際的操作大概是怎樣。
about as a first layer of neurons,
第一層的神經元,
how it works in the eye --
像素的呈現方式,
視網膜上的神經元做傳遞。
after another layer of neurons,
of different weights.
of all of those synapses.
properties of this network.
or a small group of neurons
神經元發出訊號,
those three things --
in the neural network,
神經網路裡的「突觸」、
它們是如何運作的。
x、w 和 y。
these synapses in the neural network.
各個突觸的權重。
is just a simple formula,
一個簡單的公式,
going on there, of course,
of mathematical operations.
that if you have one equation,
解開這個方程式,
by knowing the other two things.
交叉算出未知的數。
that the picture of a bird is a bird,
and w and x are known.
you know the pixels.
a relatively straightforward problem.
and you're done.
doing exactly that.
on a mobile phone,
amazing in its own right,
billions and trillions of operations
picture of a bird,
"Yes, it's a bird,"
「是的,這是一隻鳥。」
with a network of this sort.
分辨出這是哪一種鳥。
and the y is the unknown.
difficult part, of course,
最困難的 「w」,
do we figure out the w,
這樣的認知模式的?
of solving for w,
是一個求解 w 的過程,
with the simple equation
就可以得到答案。
it's the inverse to multiplication,
是因為它跟乘法相反,
very non-linear operation;
它們是「非線性運算」的概念;
to solve the equation
找個方法來解方程式,
is fairly straightforward.
a little algebra trick,
代數的小技巧,
to the right-hand side of the equation.
about it as an error.
我們把它想像成是誤差。
for w the right way,
to minimize the error,
只能用猜的來縮小誤差,
computers are very good at.
sort of play Marco Polo,
馬可波羅探索遊戲,
successive approximations to w.
那麼 w 就解出來了。
but after about a dozen steps,
但大約經過多次步驟後,
which is close enough.
相當接近了。
a lot of known x's and known y's
through an iterative process.
that we do our own learning.
this is not a bird."
for those neural connections.
產生了神經元的連結關係。
x and w fixed to solve for y;
是固定數,可以解出 y;
經常性的快速直覺判斷。
Alex Mordvintsev, on our team,
艾力克斯摩文斯夫
with what happens if we try solving for x,
解出來的 x 會變什麼樣。
that you've trained on birds,
辨識鳥圖片的神經網路,
the same error-minimization procedure,
「誤差最小化」程序
trained to recognize birds,
用來辨識鳥的神經網路,
generated entirely by a neural network
自行創造出來的鳥圖,
rather than solving for y,
by Mike Tyka in our group,
另外一位組員麥克泰卡,
of William Kentridge's artworks,
威廉肯特基的作品,
創造出了一部影片。
over the space of different animals,
to recognize and distinguish
morph from one animal to another.
不同動物的變體圖像。
have tried reducing
將這些圖案丟到一個 2D 平面上,
out of the space of all things
over that entire surface,
you make a kind of map --
你就能做出一張地圖來——
the network knows how to recognize.
"armadillo" is right in that spot.
犰狳就在圖上這個點。
of networks as well.
做出類似這樣的作品,
to recognize faces,
in a y that says, "me,"
所做出來的圖畫,
psychedelic picture of me
超現實、迷幻效果的我,
multiple points of view at once
不同視角」的感覺,
to get rid of the ambiguity
or another pose,
模糊地帶移除掉,
another kind of lighting.
this sort of reconstruction,
of different points of view,
不同角度的混合體圖像,
his own face as a guide image
用他自己的臉當作指導圖
to reconstruct my own face.
就會產生這樣的圖像。
這作品還不是很完美,
that optimization process.
more like a coherent face,
條理分明的臉。
with a blank canvas
that is itself already some other image.
因為它本身就有一些圖像。
說明了它的運作原理。
that is designed to categorize
分辨各種不同的物體,
man-made structures, animals ...
with just a picture of clouds,
雲朵的圖像開始畫起的,
what it sees in the clouds.
正在搞懂它在雲朵中看見了什麼。
you spend looking at this,
will see in the clouds.
to hallucinate into this,
讓它產生幻覺,
一些其它的實驗,
zooms hallucinates, zooms.
產生幻覺、再放大。
of the network, I suppose,
像是在神遊狀態的網路,
is eating its own tail.
What do I think I see next?"
接下來會看到什麼?」
公眾場合上展示這個影片,
called "Higher Education" --
機構做演說時展示的,
marijuana was legalized.
is not constrained.
because they're really fun to look at.
因為觀察它的變化,真的很好玩。
已經做了一些實驗,
a camera that takes a picture,
writes a poem using neural networks,
會根據圖片上的內容,
has been trained
所訓練出來的,
are very intimately connected.
things in the world,
Michelangelo really did see
any being, any alien
生物、生命、外來物種
perceptual acts of that sort
machinery that's used in both cases.
都有著相同的機制。
and creativity are by no means
that can do exactly these sorts of things.
可以做出相當類似的事。
the brain is computational.
因為大腦是會運算的。
in designing intelligent machinery.
電腦界的活動。
of those early pioneers,
is not just about accounting
we modeled them after our minds.
是讓電腦能仿效人腦。
to understand our own minds better
ABOUT THE SPEAKER
Blaise Agüera y Arcas - Software architectBlaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces.
Why you should listen
Blaise Agüera y Arcas is principal scientist at Google, where he leads a team working on machine intelligence for mobile devices. His group works extensively with deep neural nets for machine perception and distributed learning, and it also investigates so-called "connectomics" research, assessing maps of connections within the brain.
Agüera y Arcas' background is as multidimensional as the visions he helps create. In the 1990s, he authored patents on both video compression and 3D visualization techniques, and in 2001, he made an influential computational discovery that cast doubt on Gutenberg's role as the father of movable type.
He also created Seadragon (acquired by Microsoft in 2006), the visualization technology that gives Photosynth its amazingly smooth digital rendering and zoom capabilities. Photosynth itself is a vastly powerful piece of software capable of taking a wide variety of images, analyzing them for similarities, and grafting them together into an interactive three-dimensional space. This seamless patchwork of images can be viewed via multiple angles and magnifications, allowing us to look around corners or “fly” in for a (much) closer look. Simply put, it could utterly transform the way we experience digital images.
He joined Microsoft when Seadragon was acquired by Live Labs in 2006. Shortly after the acquisition of Seadragon, Agüera y Arcas directed his team in a collaboration with Microsoft Research and the University of Washington, leading to the first public previews of Photosynth several months later. His TED Talk on Seadragon and Photosynth in 2007 is rated one of TED's "most jaw-dropping." He returned to TED in 2010 to demo Bing’s augmented reality maps.
Fun fact: According to the author, Agüera y Arcas is the inspiration for the character Elgin in the 2012 best-selling novel Where'd You Go, Bernadette?
Blaise Agüera y Arcas | Speaker | TED.com