TED@BCG Paris

Blaise Agüera y Arcas: How computers are learning to be creative

Filmed:

We're on the edge of a new frontier in art and creativity -- and it's not human. Blaise Agüera y Arcas, principal scientist at Google, works with deep neural networks for machine perception and distributed learning. In this captivating demo, he shows how neural nets trained to recognize images can be run in reverse, to generate them. The results: spectacular, hallucinatory collages (and poems!) that defy categorization. "Perception and creativity are very intimately connected," Agüera y Arcas says. "Any creature, any being that is able to do perceptual acts is also able to create."

- Software architect
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces. Full bio

So, I lead a team at Google
that works on machine intelligence;
00:12
in other words, the engineering discipline
of making computers and devices
00:15
able to do some of the things
that brains do.
00:20
And this makes us
interested in real brains
00:23
and neuroscience as well,
00:26
and especially interested
in the things that our brains do
00:27
that are still far superior
to the performance of computers.
00:31
Historically, one of those areas
has been perception,
00:37
the process by which things
out there in the world --
00:40
sounds and images --
00:43
can turn into concepts in the mind.
00:45
This is essential for our own brains,
00:48
and it's also pretty useful on a computer.
00:50
The machine perception algorithms,
for example, that our team makes,
00:53
are what enable your pictures
on Google Photos to become searchable,
00:56
based on what's in them.
01:00
The flip side of perception is creativity:
01:03
turning a concept into something
out there into the world.
01:06
So over the past year,
our work on machine perception
01:09
has also unexpectedly connected
with the world of machine creativity
01:13
and machine art.
01:18
I think Michelangelo
had a penetrating insight
01:20
into to this dual relationship
between perception and creativity.
01:23
This is a famous quote of his:
01:27
"Every block of stone
has a statue inside of it,
01:29
and the job of the sculptor
is to discover it."
01:33
So I think that what
Michelangelo was getting at
01:37
is that we create by perceiving,
01:41
and that perception itself
is an act of imagination
01:44
and is the stuff of creativity.
01:47
The organ that does all the thinking
and perceiving and imagining,
01:50
of course, is the brain.
01:54
And I'd like to begin
with a brief bit of history
01:56
about what we know about brains.
01:59
Because unlike, say,
the heart or the intestines,
02:02
you really can't say very much
about a brain by just looking at it,
02:04
at least with the naked eye.
02:07
The early anatomists who looked at brains
02:09
gave the superficial structures
of this thing all kinds of fanciful names,
02:12
like hippocampus, meaning "little shrimp."
02:16
But of course that sort of thing
doesn't tell us very much
02:18
about what's actually going on inside.
02:21
The first person who, I think, really
developed some kind of insight
02:24
into what was going on in the brain
02:28
was the great Spanish neuroanatomist,
Santiago Ramón y Cajal,
02:30
in the 19th century,
02:34
who used microscopy and special stains
02:35
that could selectively fill in
or render in very high contrast
02:39
the individual cells in the brain,
02:43
in order to start to understand
their morphologies.
02:45
And these are the kinds of drawings
that he made of neurons
02:49
in the 19th century.
02:52
This is from a bird brain.
02:53
And you see this incredible variety
of different sorts of cells,
02:55
even the cellular theory itself
was quite new at this point.
02:58
And these structures,
03:02
these cells that have these arborizations,
03:03
these branches that can go
very, very long distances --
03:05
this was very novel at the time.
03:08
They're reminiscent, of course, of wires.
03:10
That might have been obvious
to some people in the 19th century;
03:13
the revolutions of wiring and electricity
were just getting underway.
03:17
But in many ways,
03:21
these microanatomical drawings
of Ramón y Cajal's, like this one,
03:22
they're still in some ways unsurpassed.
03:26
We're still more than a century later,
03:28
trying to finish the job
that Ramón y Cajal started.
03:30
These are raw data from our collaborators
03:33
at the Max Planck Institute
of Neuroscience.
03:36
And what our collaborators have done
03:39
is to image little pieces of brain tissue.
03:41
The entire sample here
is about one cubic millimeter in size,
03:46
and I'm showing you a very,
very small piece of it here.
03:49
That bar on the left is about one micron.
03:52
The structures you see are mitochondria
03:54
that are the size of bacteria.
03:57
And these are consecutive slices
03:59
through this very, very
tiny block of tissue.
04:00
Just for comparison's sake,
04:03
the diameter of an average strand
of hair is about 100 microns.
04:06
So we're looking at something
much, much smaller
04:10
than a single strand of hair.
04:12
And from these kinds of serial
electron microscopy slices,
04:13
one can start to make reconstructions
in 3D of neurons that look like these.
04:17
So these are sort of in the same
style as Ramón y Cajal.
04:22
Only a few neurons lit up,
04:26
because otherwise we wouldn't
be able to see anything here.
04:27
It would be so crowded,
04:30
so full of structure,
04:31
of wiring all connecting
one neuron to another.
04:33
So Ramón y Cajal was a little bit
ahead of his time,
04:37
and progress on understanding the brain
04:39
proceeded slowly
over the next few decades.
04:42
But we knew that neurons used electricity,
04:45
and by World War II, our technology
was advanced enough
04:48
to start doing real electrical
experiments on live neurons
04:51
to better understand how they worked.
04:53
This was the very same time
when computers were being invented,
04:56
very much based on the idea
of modeling the brain --
05:00
of "intelligent machinery,"
as Alan Turing called it,
05:03
one of the fathers of computer science.
05:07
Warren McCulloch and Walter Pitts
looked at Ramón y Cajal's drawing
05:09
of visual cortex,
05:14
which I'm showing here.
05:15
This is the cortex that processes
imagery that comes from the eye.
05:17
And for them, this looked
like a circuit diagram.
05:22
So there are a lot of details
in McCulloch and Pitts's circuit diagram
05:26
that are not quite right.
05:30
But this basic idea
05:31
that visual cortex works like a series
of computational elements
05:32
that pass information
one to the next in a cascade,
05:36
is essentially correct.
05:39
Let's talk for a moment
05:41
about what a model for processing
visual information would need to do.
05:43
The basic task of perception
05:48
is to take an image like this one and say,
05:50
"That's a bird,"
05:55
which is a very simple thing
for us to do with our brains.
05:56
But you should all understand
that for a computer,
05:59
this was pretty much impossible
just a few years ago.
06:02
The classical computing paradigm
06:05
is not one in which
this task is easy to do.
06:07
So what's going on between the pixels,
06:11
between the image of the bird
and the word "bird,"
06:13
is essentially a set of neurons
connected to each other
06:17
in a neural network,
06:20
as I'm diagramming here.
06:21
This neural network could be biological,
inside our visual cortices,
06:23
or, nowadays, we start
to have the capability
06:26
to model such neural networks
on the computer.
06:28
And I'll show you what
that actually looks like.
06:31
So the pixels you can think
about as a first layer of neurons,
06:34
and that's, in fact,
how it works in the eye --
06:37
that's the neurons in the retina.
06:39
And those feed forward
06:41
into one layer after another layer,
after another layer of neurons,
06:42
all connected by synapses
of different weights.
06:46
The behavior of this network
06:49
is characterized by the strengths
of all of those synapses.
06:50
Those characterize the computational
properties of this network.
06:54
And at the end of the day,
06:57
you have a neuron
or a small group of neurons
06:58
that light up, saying, "bird."
07:01
Now I'm going to represent
those three things --
07:03
the input pixels and the synapses
in the neural network,
07:06
and bird, the output --
07:11
by three variables: x, w and y.
07:13
There are maybe a million or so x's --
07:16
a million pixels in that image.
07:18
There are billions or trillions of w's,
07:20
which represent the weights of all
these synapses in the neural network.
07:22
And there's a very small number of y's,
07:26
of outputs that that network has.
07:28
"Bird" is only four letters, right?
07:30
So let's pretend that this
is just a simple formula,
07:32
x "x" w = y.
07:36
I'm putting the times in scare quotes
07:38
because what's really
going on there, of course,
07:40
is a very complicated series
of mathematical operations.
07:42
That's one equation.
07:46
There are three variables.
07:48
And we all know
that if you have one equation,
07:49
you can solve one variable
by knowing the other two things.
07:52
So the problem of inference,
07:56
that is, figuring out
that the picture of a bird is a bird,
08:00
is this one:
08:03
it's where y is the unknown
and w and x are known.
08:04
You know the neural network,
you know the pixels.
08:08
As you can see, that's actually
a relatively straightforward problem.
08:10
You multiply two times three
and you're done.
08:13
I'll show you an artificial neural network
08:16
that we've built recently,
doing exactly that.
08:18
This is running in real time
on a mobile phone,
08:21
and that's, of course,
amazing in its own right,
08:24
that mobile phones can do so many
billions and trillions of operations
08:27
per second.
08:31
What you're looking at is a phone
08:32
looking at one after another
picture of a bird,
08:34
and actually not only saying,
"Yes, it's a bird,"
08:37
but identifying the species of bird
with a network of this sort.
08:40
So in that picture,
08:44
the x and the w are known,
and the y is the unknown.
08:46
I'm glossing over the very
difficult part, of course,
08:50
which is how on earth
do we figure out the w,
08:52
the brain that can do such a thing?
08:56
How would we ever learn such a model?
08:59
So this process of learning,
of solving for w,
09:01
if we were doing this
with the simple equation
09:04
in which we think about these as numbers,
09:07
we know exactly how to do that: 6 = 2 x w,
09:09
well, we divide by two and we're done.
09:11
The problem is with this operator.
09:15
So, division --
09:18
we've used division because
it's the inverse to multiplication,
09:19
but as I've just said,
09:22
the multiplication is a bit of a lie here.
09:24
This is a very, very complicated,
very non-linear operation;
09:26
it has no inverse.
09:30
So we have to figure out a way
to solve the equation
09:31
without a division operator.
09:35
And the way to do that
is fairly straightforward.
09:37
You just say, let's play
a little algebra trick,
09:39
and move the six over
to the right-hand side of the equation.
09:42
Now, we're still using multiplication.
09:45
And that zero -- let's think
about it as an error.
09:47
In other words, if we've solved
for w the right way,
09:51
then the error will be zero.
09:53
And if we haven't gotten it quite right,
09:55
the error will be greater than zero.
09:57
So now we can just take guesses
to minimize the error,
09:59
and that's the sort of thing
computers are very good at.
10:02
So you've taken an initial guess:
10:05
what if w = 0?
10:06
Well, then the error is 6.
10:07
What if w = 1? The error is 4.
10:09
And then the computer can
sort of play Marco Polo,
10:10
and drive down the error close to zero.
10:13
As it does that, it's getting
successive approximations to w.
10:15
Typically, it never quite gets there,
but after about a dozen steps,
10:18
we're up to w = 2.999,
which is close enough.
10:22
And this is the learning process.
10:28
So remember that what's been going on here
10:29
is that we've been taking
a lot of known x's and known y's
10:32
and solving for the w in the middle
through an iterative process.
10:37
It's exactly the same way
that we do our own learning.
10:40
We have many, many images as babies
10:44
and we get told, "This is a bird;
this is not a bird."
10:46
And over time, through iteration,
10:49
we solve for w, we solve
for those neural connections.
10:51
So now, we've held
x and w fixed to solve for y;
10:55
that's everyday, fast perception.
10:59
We figure out how we can solve for w,
11:01
that's learning, which is a lot harder,
11:03
because we need to do error minimization,
11:04
using a lot of training examples.
11:06
And about a year ago,
Alex Mordvintsev, on our team,
11:08
decided to experiment
with what happens if we try solving for x,
11:11
given a known w and a known y.
11:15
In other words,
11:17
you know that it's a bird,
11:19
and you already have your neural network
that you've trained on birds,
11:20
but what is the picture of a bird?
11:23
It turns out that by using exactly
the same error-minimization procedure,
11:26
one can do that with the network
trained to recognize birds,
11:31
and the result turns out to be ...
11:35
a picture of birds.
11:42
So this is a picture of birds
generated entirely by a neural network
11:44
that was trained to recognize birds,
11:48
just by solving for x
rather than solving for y,
11:50
and doing that iteratively.
11:53
Here's another fun example.
11:55
This was a work made
by Mike Tyka in our group,
11:57
which he calls "Animal Parade."
12:00
It reminds me a little bit
of William Kentridge's artworks,
12:03
in which he makes sketches, rubs them out,
12:06
makes sketches, rubs them out,
12:08
and creates a movie this way.
12:10
In this case,
12:11
what Mike is doing is varying y
over the space of different animals,
12:12
in a network designed
to recognize and distinguish
12:16
different animals from each other.
12:18
And you get this strange, Escher-like
morph from one animal to another.
12:20
Here he and Alex together
have tried reducing
12:26
the y's to a space of only two dimensions,
12:30
thereby making a map
out of the space of all things
12:33
recognized by this network.
12:36
Doing this kind of synthesis
12:38
or generation of imagery
over that entire surface,
12:40
varying y over the surface,
you make a kind of map --
12:43
a visual map of all the things
the network knows how to recognize.
12:45
The animals are all here;
"armadillo" is right in that spot.
12:49
You can do this with other kinds
of networks as well.
12:52
This is a network designed
to recognize faces,
12:55
to distinguish one face from another.
12:58
And here, we're putting
in a y that says, "me,"
13:00
my own face parameters.
13:03
And when this thing solves for x,
13:05
it generates this rather crazy,
13:06
kind of cubist, surreal,
psychedelic picture of me
13:09
from multiple points of view at once.
13:13
The reason it looks like
multiple points of view at once
13:15
is because that network is designed
to get rid of the ambiguity
13:18
of a face being in one pose
or another pose,
13:22
being looked at with one kind of lighting,
another kind of lighting.
13:24
So when you do
this sort of reconstruction,
13:28
if you don't use some sort of guide image
13:30
or guide statistics,
13:32
then you'll get a sort of confusion
of different points of view,
13:33
because it's ambiguous.
13:37
This is what happens if Alex uses
his own face as a guide image
13:39
during that optimization process
to reconstruct my own face.
13:43
So you can see it's not perfect.
13:48
There's still quite a lot of work to do
13:50
on how we optimize
that optimization process.
13:52
But you start to get something
more like a coherent face,
13:54
rendered using my own face as a guide.
13:57
You don't have to start
with a blank canvas
14:00
or with white noise.
14:03
When you're solving for x,
14:04
you can begin with an x,
that is itself already some other image.
14:05
That's what this little demonstration is.
14:09
This is a network
that is designed to categorize
14:12
all sorts of different objects --
man-made structures, animals ...
14:16
Here we're starting
with just a picture of clouds,
14:19
and as we optimize,
14:22
basically, this network is figuring out
what it sees in the clouds.
14:23
And the more time
you spend looking at this,
14:28
the more things you also
will see in the clouds.
14:31
You could also use the face network
to hallucinate into this,
14:34
and you get some pretty crazy stuff.
14:38
(Laughter)
14:40
Or, Mike has done some other experiments
14:42
in which he takes that cloud image,
14:44
hallucinates, zooms, hallucinates,
zooms hallucinates, zooms.
14:48
And in this way,
14:52
you can get a sort of fugue state
of the network, I suppose,
14:53
or a sort of free association,
14:57
in which the network
is eating its own tail.
15:01
So every image is now the basis for,
15:03
"What do I think I see next?
15:06
What do I think I see next?
What do I think I see next?"
15:08
I showed this for the first time in public
15:11
to a group at a lecture in Seattle
called "Higher Education" --
15:14
this was right after
marijuana was legalized.
15:19
(Laughter)
15:22
So I'd like to finish up quickly
15:26
by just noting that this technology
is not constrained.
15:28
I've shown you purely visual examples
because they're really fun to look at.
15:32
It's not a purely visual technology.
15:36
Our artist collaborator, Ross Goodwin,
15:39
has done experiments involving
a camera that takes a picture,
15:41
and then a computer in his backpack
writes a poem using neural networks,
15:44
based on the contents of the image.
15:48
And that poetry neural network
has been trained
15:50
on a large corpus of 20th-century poetry.
15:53
And the poetry is, you know,
15:56
I think, kind of not bad, actually.
15:57
(Laughter)
15:59
In closing,
16:01
I think that per Michelangelo,
16:02
I think he was right;
16:04
perception and creativity
are very intimately connected.
16:05
What we've just seen are neural networks
16:09
that are entirely trained to discriminate,
16:12
or to recognize different
things in the world,
16:14
able to be run in reverse, to generate.
16:16
One of the things that suggests to me
16:19
is not only that
Michelangelo really did see
16:21
the sculpture in the blocks of stone,
16:24
but that any creature,
any being, any alien
16:26
that is able to do
perceptual acts of that sort
16:30
is also able to create
16:33
because it's exactly the same
machinery that's used in both cases.
16:35
Also, I think that perception
and creativity are by no means
16:38
uniquely human.
16:43
We start to have computer models
that can do exactly these sorts of things.
16:44
And that ought to be unsurprising;
the brain is computational.
16:48
And finally,
16:51
computing began as an exercise
in designing intelligent machinery.
16:53
It was very much modeled after the idea
16:57
of how could we make machines intelligent.
17:00
And we finally are starting to fulfill now
17:03
some of the promises
of those early pioneers,
17:05
of Turing and von Neumann
17:07
and McCulloch and Pitts.
17:09
And I think that computing
is not just about accounting
17:11
or playing Candy Crush or something.
17:16
From the beginning,
we modeled them after our minds.
17:18
And they give us both the ability
to understand our own minds better
17:20
and to extend them.
17:24
Thank you very much.
17:26
(Applause)
17:27

▲Back to top

About the Speaker:

Blaise Agüera y Arcas - Software architect
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces.

Why you should listen

Blaise Agüera y Arcas is principal scientist at Google, where he leads a team working on machine intelligence for mobile devices. His group works extensively with deep neural nets for machine perception and distributed learning, and it also investigates so-called "connectomics" research, assessing maps of connections within the brain.

Agüera y Arcas' background is as multidimensional as the visions he helps create. In the 1990s, he authored patents on both video compression and 3D visualization techniques, and in 2001, he made an influential computational discovery that cast doubt on Gutenberg's role as the father of movable type.

He also created Seadragon (acquired by Microsoft in 2006), the visualization technology that gives Photosynth its amazingly smooth digital rendering and zoom capabilities. Photosynth itself is a vastly powerful piece of software capable of taking a wide variety of images, analyzing them for similarities, and grafting them together into an interactive three-dimensional space. This seamless patchwork of images can be viewed via multiple angles and magnifications, allowing us to look around corners or “fly” in for a (much) closer look. Simply put, it could utterly transform the way we experience digital images.

He joined Microsoft when Seadragon was acquired by Live Labs in 2006. Shortly after the acquisition of Seadragon, Agüera y Arcas directed his team in a collaboration with Microsoft Research and the University of Washington, leading to the first public previews of Photosynth several months later. His TED Talk on Seadragon and Photosynth in 2007 is rated one of TED's "most jaw-dropping." He returned to TED in 2010 to demo Bing’s augmented reality maps.

Fun fact: According to the author, Agüera y Arcas is the inspiration for the character Elgin in the 2012 best-selling novel Where'd You Go, Bernadette?

More profile about the speaker
Blaise Agüera y Arcas | Speaker | TED.com