sponsored links
TED2007

Blaise Agüera y Arcas: How PhotoSynth can connect the world's images

March 3, 2007

Blaise Aguera y Arcas leads a dazzling demo of Photosynth, software that could transform the way we look at digital images. Using still photos culled from the Web, Photosynth builds breathtaking dreamscapes and lets us navigate them.

Blaise Agüera y Arcas - Software architect
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces. Full bio

sponsored links
Double-click the English subtitles below to play the video.
What I'm going to show you first, as quickly as I can,
00:25
is some foundational work, some new technology
00:27
that we brought to Microsoft as part of an acquisition
00:31
almost exactly a year ago. This is Seadragon,
00:34
and it's an environment in which you can either locally or remotely
00:37
interact with vast amounts of visual data.
00:40
We're looking at many, many gigabytes of digital photos here
00:43
and kind of seamlessly and continuously zooming in,
00:46
panning through the thing, rearranging it in any way we want.
00:50
And it doesn't matter how much information we're looking at,
00:52
how big these collections are or how big the images are.
00:56
Most of them are ordinary digital camera photos,
00:59
but this one, for example, is a scan from the Library of Congress,
01:01
and it's in the 300 megapixel range.
01:05
It doesn't make any difference
01:08
because the only thing that ought to limit the performance
01:09
of a system like this one is the number of pixels on your screen
01:12
at any given moment. It's also very flexible architecture.
01:15
This is an entire book, so this is an example of non-image data.
01:18
This is "Bleak House" by Dickens. Every column is a chapter.
01:22
To prove to you that it's really text, and not an image,
01:27
we can do something like so, to really show
01:31
that this is a real representation of the text; it's not a picture.
01:33
Maybe this is a kind of an artificial way to read an e-book.
01:37
I wouldn't recommend it.
01:39
This is a more realistic case. This is an issue of The Guardian.
01:40
Every large image is the beginning of a section.
01:43
And this really gives you the joy and the good experience
01:45
of reading the real paper version of a magazine or a newspaper,
01:48
which is an inherently multi-scale kind of medium.
01:54
We've also done a little something
01:56
with the corner of this particular issue of The Guardian.
01:57
We've made up a fake ad that's very high resolution --
02:00
much higher than you'd be able to get in an ordinary ad --
02:03
and we've embedded extra content.
02:05
If you want to see the features of this car, you can see it here.
02:07
Or other models, or even technical specifications.
02:10
And this really gets at some of these ideas
02:15
about really doing away with those limits on screen real estate.
02:18
We hope that this means no more pop-ups
02:22
and other kind of rubbish like that -- shouldn't be necessary.
02:24
Of course, mapping is one of those really obvious applications
02:27
for a technology like this.
02:29
And this one I really won't spend any time on,
02:31
except to say that we have things to contribute to this field as well.
02:33
But those are all the roads in the U.S.
02:37
superimposed on top of a NASA geospatial image.
02:39
So let's pull up, now, something else.
02:44
This is actually live on the Web now; you can go check it out.
02:46
This is a project called Photosynth,
02:49
which really marries two different technologies.
02:51
One of them is Seadragon
02:52
and the other is some very beautiful computer vision research
02:54
done by Noah Snavely, a graduate student at the University of Washington,
02:57
co-advised by Steve Seitz at U.W.
03:00
and Rick Szeliski at Microsoft Research. A very nice collaboration.
03:02
And so this is live on the Web. It's powered by Seadragon.
03:07
You can see that when we kind of do these sorts of views,
03:09
where we can dive through images
03:12
and have this kind of multi-resolution experience.
03:14
But the spatial arrangement of the images here is actually meaningful.
03:16
The computer vision algorithms have registered these images together
03:20
so that they correspond to the real space in which these shots --
03:23
all taken near Grassi Lakes in the Canadian Rockies --
03:27
all these shots were taken. So you see elements here
03:31
of stabilized slide-show or panoramic imaging,
03:33
and these things have all been related spatially.
03:40
I'm not sure if I have time to show you any other environments.
03:42
There are some that are much more spatial.
03:45
I would like to jump straight to one of Noah's original data-sets --
03:47
and this is from an early prototype of Photosynth
03:50
that we first got working in the summer --
03:52
to show you what I think
03:54
is really the punch line behind this technology,
03:55
the Photosynth technology. And it's not necessarily so apparent
03:59
from looking at the environments that we've put up on the website.
04:01
We had to worry about the lawyers and so on.
04:04
This is a reconstruction of Notre Dame Cathedral
04:07
that was done entirely computationally
04:09
from images scraped from Flickr. You just type Notre Dame into Flickr,
04:11
and you get some pictures of guys in t-shirts, and of the campus
04:14
and so on. And each of these orange cones represents an image
04:17
that was discovered to belong to this model.
04:22
And so these are all Flickr images,
04:26
and they've all been related spatially in this way.
04:28
And we can just navigate in this very simple way.
04:31
(Applause)
04:35
You know, I never thought that I'd end up working at Microsoft.
04:44
It's very gratifying to have this kind of reception here.
04:46
(Laughter)
04:50
I guess you can see
04:53
this is lots of different types of cameras:
04:56
it's everything from cell phone cameras to professional SLRs,
04:58
quite a large number of them, stitched
05:02
together in this environment.
05:03
And if I can, I'll find some of the sort of weird ones.
05:04
So many of them are occluded by faces, and so on.
05:08
Somewhere in here there are actually
05:13
a series of photographs -- here we go.
05:15
This is actually a poster of Notre Dame that registered correctly.
05:17
We can dive in from the poster
05:21
to a physical view of this environment.
05:24
What the point here really is is that we can do things
05:31
with the social environment. This is now taking data from everybody --
05:34
from the entire collective memory
05:39
of, visually, of what the Earth looks like --
05:40
and link all of that together.
05:43
All of those photos become linked together,
05:44
and they make something emergent
05:46
that's greater than the sum of the parts.
05:47
You have a model that emerges of the entire Earth.
05:49
Think of this as the long tail to Stephen Lawler's Virtual Earth work.
05:51
And this is something that grows in complexity
05:56
as people use it, and whose benefits become greater
05:58
to the users as they use it.
06:01
Their own photos are getting tagged with meta-data
06:03
that somebody else entered.
06:05
If somebody bothered to tag all of these saints
06:07
and say who they all are, then my photo of Notre Dame Cathedral
06:10
suddenly gets enriched with all of that data,
06:13
and I can use it as an entry point to dive into that space,
06:15
into that meta-verse, using everybody else's photos,
06:18
and do a kind of a cross-modal
06:21
and cross-user social experience that way.
06:25
And of course, a by-product of all of that
06:28
is immensely rich virtual models
06:30
of every interesting part of the Earth, collected
06:32
not just from overhead flights and from satellite images
06:35
and so on, but from the collective memory.
06:38
Thank you so much.
06:40
(Applause)
06:42
Chris Anderson: Do I understand this right? That what your software is going to allow,
06:53
is that at some point, really within the next few years,
06:58
all the pictures that are shared by anyone across the world
07:01
are going to basically link together?
07:05
BAA: Yes. What this is really doing is discovering.
07:07
It's creating hyperlinks, if you will, between images.
07:09
And it's doing that
07:12
based on the content inside the images.
07:13
And that gets really exciting when you think about the richness
07:14
of the semantic information that a lot of those images have.
07:17
Like when you do a web search for images,
07:19
you type in phrases, and the text on the web page
07:22
is carrying a lot of information about what that picture is of.
07:24
Now, what if that picture links to all of your pictures?
07:27
Then the amount of semantic interconnection
07:29
and the amount of richness that comes out of that
07:31
is really huge. It's a classic network effect.
07:32
CA: Blaise, that is truly incredible. Congratulations.
07:35
BAA: Thanks so much.
07:37

sponsored links

Blaise Agüera y Arcas - Software architect
Blaise Agüera y Arcas works on machine learning at Google. Previously a Distinguished Engineer at Microsoft, he has worked on augmented reality, mapping, wearable computing and natural user interfaces.

Why you should listen

Blaise Agüera y Arcas is principal scientist at Google, where he leads a team working on machine intelligence for mobile devices. His group works extensively with deep neural nets for machine perception and distributed learning, and it also investigates so-called "connectomics" research, assessing maps of connections within the brain.

Agüera y Arcas' background is as multidimensional as the visions he helps create. In the 1990s, he authored patents on both video compression and 3D visualization techniques, and in 2001, he made an influential computational discovery that cast doubt on Gutenberg's role as the father of movable type.

He also created Seadragon (acquired by Microsoft in 2006), the visualization technology that gives Photosynth its amazingly smooth digital rendering and zoom capabilities. Photosynth itself is a vastly powerful piece of software capable of taking a wide variety of images, analyzing them for similarities, and grafting them together into an interactive three-dimensional space. This seamless patchwork of images can be viewed via multiple angles and magnifications, allowing us to look around corners or “fly” in for a (much) closer look. Simply put, it could utterly transform the way we experience digital images.

He joined Microsoft when Seadragon was acquired by Live Labs in 2006. Shortly after the acquisition of Seadragon, Agüera y Arcas directed his team in a collaboration with Microsoft Research and the University of Washington, leading to the first public previews of Photosynth several months later. His TED Talk on Seadragon and Photosynth in 2007 is rated one of TED's "most jaw-dropping." He returned to TED in 2010 to demo Bing’s augmented reality maps.

Fun fact: According to the author, Agüera y Arcas is the inspiration for the character Elgin in the 2012 best-selling novel Where'd You Go, Bernadette?

sponsored links

If you need translations, you can install "Google Translate" extension into your Chrome Browser.
Furthermore, you can change playback rate by installing "Video Speed Controller" extension.

Data provided by TED.

This website is owned and operated by Tokyo English Network.
The developer's blog is here.