ABOUT THE SPEAKER
Deb Roy - Cognitive scientist
Deb Roy studies how children learn language, and designs machines that learn to communicate in human-like ways. On sabbatical from MIT Media Lab, he's working with the AI company Bluefin Labs.

Why you should listen

Deb Roy directs the Cognitive Machines group at the MIT Media Lab, where he studies how children learn language, and designs machines that learn to communicate in human-like ways. To enable this work, he has pioneered new data-driven methods for analyzing and modeling human linguistic and social behavior. He has authored numerous scientific papers on artificial intelligence, cognitive modeling, human-machine interaction, data mining, and information visualization.

Deb Roy was the co-founder and serves as CEO of Bluefin Labs, a venture-backed technology company. Built upon deep machine learning principles developed in his research over the past 15 years, Bluefin has created a technology platform that analyzes social media commentary to measure real-time audience response to TV ads and shows.

Follow Deb Roy on Twitter>

Roy adds some relevant papers:

Deb Roy. (2009). New Horizons in the Study of Child Language Acquisition. Proceedings of Interspeech 2009. Brighton, England. bit.ly/fSP4Qh

Brandon C. Roy, Michael C. Frank and Deb Roy. (2009). Exploring word learning in a high-density longitudinal corpus. Proceedings of the 31st Annual Meeting of the Cognitive Science Society. Amsterdam, Netherlands. bit.ly/e1qxej

Plenty more papers on our research including technology and methodology can be found here, together with other research from my lab at MIT: bit.ly/h3paSQ

The work that I mentioned on relationships between television content and the social graph is being done at Bluefin Labs (www.bluefinlabs.com). Details of this work have not been published. The social structures we are finding (and that I highlighted in my TED talk) are indeed new. The social media communication channels that are leading to their formation did not even exist a few years ago, and Bluefin's technology platform for discovering these kinds of structures is the first of its kind. We'll certainly have more to say about all this as we continue to dig into this fascinating new kind of data, and as new social structures continue to evolve!

More profile about the speaker
Deb Roy | Speaker | TED.com
TED2011

Deb Roy: The birth of a word

Filmed:
2,809,941 views

MIT researcher Deb Roy wanted to understand how his infant son learned language -- so he wired up his house with videocameras to catch every moment (with exceptions) of his son's life, then parsed 90,000 hours of home video to watch "gaaaa" slowly turn into "water." Astonishing, data-rich research with deep implications for how we learn.
- Cognitive scientist
Deb Roy studies how children learn language, and designs machines that learn to communicate in human-like ways. On sabbatical from MIT Media Lab, he's working with the AI company Bluefin Labs. Full bio

Double-click the English transcript below to play the video.

00:15
Imagine if you could record your life --
0
0
4000
00:19
everything you said, everything you did,
1
4000
3000
00:22
available in a perfect memory store at your fingertips,
2
7000
3000
00:25
so you could go back
3
10000
2000
00:27
and find memorable moments and relive them,
4
12000
3000
00:30
or sift through traces of time
5
15000
3000
00:33
and discover patterns in your own life
6
18000
2000
00:35
that previously had gone undiscovered.
7
20000
3000
00:38
Well that's exactly the journey
8
23000
2000
00:40
that my family began
9
25000
2000
00:42
five and a half years ago.
10
27000
2000
00:44
This is my wife and collaborator, Rupal.
11
29000
3000
00:47
And on this day, at this moment,
12
32000
2000
00:49
we walked into the house with our first child,
13
34000
2000
00:51
our beautiful baby boy.
14
36000
2000
00:53
And we walked into a house
15
38000
3000
00:56
with a very special home video recording system.
16
41000
4000
01:07
(Video) Man: Okay.
17
52000
2000
01:10
Deb Roy: This moment
18
55000
1000
01:11
and thousands of other moments special for us
19
56000
3000
01:14
were captured in our home
20
59000
2000
01:16
because in every room in the house,
21
61000
2000
01:18
if you looked up, you'd see a camera and a microphone,
22
63000
3000
01:21
and if you looked down,
23
66000
2000
01:23
you'd get this bird's-eye view of the room.
24
68000
2000
01:25
Here's our living room,
25
70000
3000
01:28
the baby bedroom,
26
73000
3000
01:31
kitchen, dining room
27
76000
2000
01:33
and the rest of the house.
28
78000
2000
01:35
And all of these fed into a disc array
29
80000
3000
01:38
that was designed for a continuous capture.
30
83000
3000
01:41
So here we are flying through a day in our home
31
86000
3000
01:44
as we move from sunlit morning
32
89000
3000
01:47
through incandescent evening
33
92000
2000
01:49
and, finally, lights out for the day.
34
94000
3000
01:53
Over the course of three years,
35
98000
3000
01:56
we recorded eight to 10 hours a day,
36
101000
2000
01:58
amassing roughly a quarter-million hours
37
103000
3000
02:01
of multi-track audio and video.
38
106000
3000
02:04
So you're looking at a piece of what is by far
39
109000
2000
02:06
the largest home video collection ever made.
40
111000
2000
02:08
(Laughter)
41
113000
3000
02:11
And what this data represents
42
116000
2000
02:13
for our family at a personal level,
43
118000
4000
02:17
the impact has already been immense,
44
122000
2000
02:19
and we're still learning its value.
45
124000
3000
02:22
Countless moments
46
127000
2000
02:24
of unsolicited natural moments, not posed moments,
47
129000
3000
02:27
are captured there,
48
132000
2000
02:29
and we're starting to learn how to discover them and find them.
49
134000
3000
02:32
But there's also a scientific reason that drove this project,
50
137000
3000
02:35
which was to use this natural longitudinal data
51
140000
4000
02:39
to understand the process
52
144000
2000
02:41
of how a child learns language --
53
146000
2000
02:43
that child being my son.
54
148000
2000
02:45
And so with many privacy provisions put in place
55
150000
4000
02:49
to protect everyone who was recorded in the data,
56
154000
3000
02:52
we made elements of the data available
57
157000
3000
02:55
to my trusted research team at MIT
58
160000
3000
02:58
so we could start teasing apart patterns
59
163000
3000
03:01
in this massive data set,
60
166000
3000
03:04
trying to understand the influence of social environments
61
169000
3000
03:07
on language acquisition.
62
172000
2000
03:09
So we're looking here
63
174000
2000
03:11
at one of the first things we started to do.
64
176000
2000
03:13
This is my wife and I cooking breakfast in the kitchen,
65
178000
4000
03:17
and as we move through space and through time,
66
182000
3000
03:20
a very everyday pattern of life in the kitchen.
67
185000
3000
03:23
In order to convert
68
188000
2000
03:25
this opaque, 90,000 hours of video
69
190000
3000
03:28
into something that we could start to see,
70
193000
2000
03:30
we use motion analysis to pull out,
71
195000
2000
03:32
as we move through space and through time,
72
197000
2000
03:34
what we call space-time worms.
73
199000
3000
03:37
And this has become part of our toolkit
74
202000
3000
03:40
for being able to look and see
75
205000
3000
03:43
where the activities are in the data,
76
208000
2000
03:45
and with it, trace the pattern of, in particular,
77
210000
3000
03:48
where my son moved throughout the home,
78
213000
2000
03:50
so that we could focus our transcription efforts,
79
215000
3000
03:53
all of the speech environment around my son --
80
218000
3000
03:56
all of the words that he heard from myself, my wife, our nanny,
81
221000
3000
03:59
and over time, the words he began to produce.
82
224000
3000
04:02
So with that technology and that data
83
227000
3000
04:05
and the ability to, with machine assistance,
84
230000
2000
04:07
transcribe speech,
85
232000
2000
04:09
we've now transcribed
86
234000
2000
04:11
well over seven million words of our home transcripts.
87
236000
3000
04:14
And with that, let me take you now
88
239000
2000
04:16
for a first tour into the data.
89
241000
3000
04:19
So you've all, I'm sure,
90
244000
2000
04:21
seen time-lapse videos
91
246000
2000
04:23
where a flower will blossom as you accelerate time.
92
248000
3000
04:26
I'd like you to now experience
93
251000
2000
04:28
the blossoming of a speech form.
94
253000
2000
04:30
My son, soon after his first birthday,
95
255000
2000
04:32
would say "gaga" to mean water.
96
257000
3000
04:35
And over the course of the next half-year,
97
260000
3000
04:38
he slowly learned to approximate
98
263000
2000
04:40
the proper adult form, "water."
99
265000
3000
04:43
So we're going to cruise through half a year
100
268000
2000
04:45
in about 40 seconds.
101
270000
2000
04:47
No video here,
102
272000
2000
04:49
so you can focus on the sound, the acoustics,
103
274000
3000
04:52
of a new kind of trajectory:
104
277000
2000
04:54
gaga to water.
105
279000
2000
04:56
(Audio) Baby: Gagagagagaga
106
281000
12000
05:08
Gaga gaga gaga
107
293000
4000
05:12
guga guga guga
108
297000
5000
05:17
wada gaga gaga guga gaga
109
302000
5000
05:22
wader guga guga
110
307000
4000
05:26
water water water
111
311000
3000
05:29
water water water
112
314000
6000
05:35
water water
113
320000
4000
05:39
water.
114
324000
2000
05:41
DR: He sure nailed it, didn't he.
115
326000
2000
05:43
(Applause)
116
328000
7000
05:50
So he didn't just learn water.
117
335000
2000
05:52
Over the course of the 24 months,
118
337000
2000
05:54
the first two years that we really focused on,
119
339000
3000
05:57
this is a map of every word he learned in chronological order.
120
342000
4000
06:01
And because we have full transcripts,
121
346000
3000
06:04
we've identified each of the 503 words
122
349000
2000
06:06
that he learned to produce by his second birthday.
123
351000
2000
06:08
He was an early talker.
124
353000
2000
06:10
And so we started to analyze why.
125
355000
3000
06:13
Why were certain words born before others?
126
358000
3000
06:16
This is one of the first results
127
361000
2000
06:18
that came out of our study a little over a year ago
128
363000
2000
06:20
that really surprised us.
129
365000
2000
06:22
The way to interpret this apparently simple graph
130
367000
3000
06:25
is, on the vertical is an indication
131
370000
2000
06:27
of how complex caregiver utterances are
132
372000
3000
06:30
based on the length of utterances.
133
375000
2000
06:32
And the [horizontal] axis is time.
134
377000
3000
06:35
And all of the data,
135
380000
2000
06:37
we aligned based on the following idea:
136
382000
3000
06:40
Every time my son would learn a word,
137
385000
3000
06:43
we would trace back and look at all of the language he heard
138
388000
3000
06:46
that contained that word.
139
391000
2000
06:48
And we would plot the relative length of the utterances.
140
393000
4000
06:52
And what we found was this curious phenomena,
141
397000
3000
06:55
that caregiver speech would systematically dip to a minimum,
142
400000
3000
06:58
making language as simple as possible,
143
403000
3000
07:01
and then slowly ascend back up in complexity.
144
406000
3000
07:04
And the amazing thing was
145
409000
2000
07:06
that bounce, that dip,
146
411000
2000
07:08
lined up almost precisely
147
413000
2000
07:10
with when each word was born --
148
415000
2000
07:12
word after word, systematically.
149
417000
2000
07:14
So it appears that all three primary caregivers --
150
419000
2000
07:16
myself, my wife and our nanny --
151
421000
3000
07:19
were systematically and, I would think, subconsciously
152
424000
3000
07:22
restructuring our language
153
427000
2000
07:24
to meet him at the birth of a word
154
429000
3000
07:27
and bring him gently into more complex language.
155
432000
4000
07:31
And the implications of this -- there are many,
156
436000
2000
07:33
but one I just want to point out,
157
438000
2000
07:35
is that there must be amazing feedback loops.
158
440000
3000
07:38
Of course, my son is learning
159
443000
2000
07:40
from his linguistic environment,
160
445000
2000
07:42
but the environment is learning from him.
161
447000
3000
07:45
That environment, people, are in these tight feedback loops
162
450000
3000
07:48
and creating a kind of scaffolding
163
453000
2000
07:50
that has not been noticed until now.
164
455000
3000
07:54
But that's looking at the speech context.
165
459000
2000
07:56
What about the visual context?
166
461000
2000
07:58
We're not looking at --
167
463000
2000
08:00
think of this as a dollhouse cutaway of our house.
168
465000
2000
08:02
We've taken those circular fish-eye lens cameras,
169
467000
3000
08:05
and we've done some optical correction,
170
470000
2000
08:07
and then we can bring it into three-dimensional life.
171
472000
4000
08:11
So welcome to my home.
172
476000
2000
08:13
This is a moment,
173
478000
2000
08:15
one moment captured across multiple cameras.
174
480000
3000
08:18
The reason we did this is to create the ultimate memory machine,
175
483000
3000
08:21
where you can go back and interactively fly around
176
486000
3000
08:24
and then breathe video-life into this system.
177
489000
3000
08:27
What I'm going to do
178
492000
2000
08:29
is give you an accelerated view of 30 minutes,
179
494000
3000
08:32
again, of just life in the living room.
180
497000
2000
08:34
That's me and my son on the floor.
181
499000
3000
08:37
And there's video analytics
182
502000
2000
08:39
that are tracking our movements.
183
504000
2000
08:41
My son is leaving red ink. I am leaving green ink.
184
506000
3000
08:44
We're now on the couch,
185
509000
2000
08:46
looking out through the window at cars passing by.
186
511000
3000
08:49
And finally, my son playing in a walking toy by himself.
187
514000
3000
08:52
Now we freeze the action, 30 minutes,
188
517000
3000
08:55
we turn time into the vertical axis,
189
520000
2000
08:57
and we open up for a view
190
522000
2000
08:59
of these interaction traces we've just left behind.
191
524000
3000
09:02
And we see these amazing structures --
192
527000
3000
09:05
these little knots of two colors of thread
193
530000
3000
09:08
we call "social hot spots."
194
533000
2000
09:10
The spiral thread
195
535000
2000
09:12
we call a "solo hot spot."
196
537000
2000
09:14
And we think that these affect the way language is learned.
197
539000
3000
09:17
What we'd like to do
198
542000
2000
09:19
is start understanding
199
544000
2000
09:21
the interaction between these patterns
200
546000
2000
09:23
and the language that my son is exposed to
201
548000
2000
09:25
to see if we can predict
202
550000
2000
09:27
how the structure of when words are heard
203
552000
2000
09:29
affects when they're learned --
204
554000
2000
09:31
so in other words, the relationship
205
556000
2000
09:33
between words and what they're about in the world.
206
558000
4000
09:37
So here's how we're approaching this.
207
562000
2000
09:39
In this video,
208
564000
2000
09:41
again, my son is being traced out.
209
566000
2000
09:43
He's leaving red ink behind.
210
568000
2000
09:45
And there's our nanny by the door.
211
570000
2000
09:47
(Video) Nanny: You want water? (Baby: Aaaa.)
212
572000
3000
09:50
Nanny: All right. (Baby: Aaaa.)
213
575000
3000
09:53
DR: She offers water,
214
578000
2000
09:55
and off go the two worms
215
580000
2000
09:57
over to the kitchen to get water.
216
582000
2000
09:59
And what we've done is use the word "water"
217
584000
2000
10:01
to tag that moment, that bit of activity.
218
586000
2000
10:03
And now we take the power of data
219
588000
2000
10:05
and take every time my son
220
590000
3000
10:08
ever heard the word water
221
593000
2000
10:10
and the context he saw it in,
222
595000
2000
10:12
and we use it to penetrate through the video
223
597000
3000
10:15
and find every activity trace
224
600000
3000
10:18
that co-occurred with an instance of water.
225
603000
3000
10:21
And what this data leaves in its wake
226
606000
2000
10:23
is a landscape.
227
608000
2000
10:25
We call these wordscapes.
228
610000
2000
10:27
This is the wordscape for the word water,
229
612000
2000
10:29
and you can see most of the action is in the kitchen.
230
614000
2000
10:31
That's where those big peaks are over to the left.
231
616000
3000
10:34
And just for contrast, we can do this with any word.
232
619000
3000
10:37
We can take the word "bye"
233
622000
2000
10:39
as in "good bye."
234
624000
2000
10:41
And we're now zoomed in over the entrance to the house.
235
626000
2000
10:43
And we look, and we find, as you would expect,
236
628000
3000
10:46
a contrast in the landscape
237
631000
2000
10:48
where the word "bye" occurs much more in a structured way.
238
633000
3000
10:51
So we're using these structures
239
636000
2000
10:53
to start predicting
240
638000
2000
10:55
the order of language acquisition,
241
640000
3000
10:58
and that's ongoing work now.
242
643000
2000
11:00
In my lab, which we're peering into now, at MIT --
243
645000
3000
11:03
this is at the media lab.
244
648000
2000
11:05
This has become my favorite way
245
650000
2000
11:07
of videographing just about any space.
246
652000
2000
11:09
Three of the key people in this project,
247
654000
2000
11:11
Philip DeCamp, Rony Kubat and Brandon Roy are pictured here.
248
656000
3000
11:14
Philip has been a close collaborator
249
659000
2000
11:16
on all the visualizations you're seeing.
250
661000
2000
11:18
And Michael Fleischman
251
663000
3000
11:21
was another Ph.D. student in my lab
252
666000
2000
11:23
who worked with me on this home video analysis,
253
668000
3000
11:26
and he made the following observation:
254
671000
3000
11:29
that "just the way that we're analyzing
255
674000
2000
11:31
how language connects to events
256
676000
3000
11:34
which provide common ground for language,
257
679000
2000
11:36
that same idea we can take out of your home, Deb,
258
681000
4000
11:40
and we can apply it to the world of public media."
259
685000
3000
11:43
And so our effort took an unexpected turn.
260
688000
3000
11:46
Think of mass media
261
691000
2000
11:48
as providing common ground
262
693000
2000
11:50
and you have the recipe
263
695000
2000
11:52
for taking this idea to a whole new place.
264
697000
3000
11:55
We've started analyzing television content
265
700000
3000
11:58
using the same principles --
266
703000
2000
12:00
analyzing event structure of a TV signal --
267
705000
3000
12:03
episodes of shows,
268
708000
2000
12:05
commercials,
269
710000
2000
12:07
all of the components that make up the event structure.
270
712000
3000
12:10
And we're now, with satellite dishes, pulling and analyzing
271
715000
3000
12:13
a good part of all the TV being watched in the United States.
272
718000
3000
12:16
And you don't have to now go and instrument living rooms with microphones
273
721000
3000
12:19
to get people's conversations,
274
724000
2000
12:21
you just tune into publicly available social media feeds.
275
726000
3000
12:24
So we're pulling in
276
729000
2000
12:26
about three billion comments a month,
277
731000
2000
12:28
and then the magic happens.
278
733000
2000
12:30
You have the event structure,
279
735000
2000
12:32
the common ground that the words are about,
280
737000
2000
12:34
coming out of the television feeds;
281
739000
3000
12:37
you've got the conversations
282
742000
2000
12:39
that are about those topics;
283
744000
2000
12:41
and through semantic analysis --
284
746000
3000
12:44
and this is actually real data you're looking at
285
749000
2000
12:46
from our data processing --
286
751000
2000
12:48
each yellow line is showing a link being made
287
753000
3000
12:51
between a comment in the wild
288
756000
3000
12:54
and a piece of event structure coming out of the television signal.
289
759000
3000
12:57
And the same idea now
290
762000
2000
12:59
can be built up.
291
764000
2000
13:01
And we get this wordscape,
292
766000
2000
13:03
except now words are not assembled in my living room.
293
768000
3000
13:06
Instead, the context, the common ground activities,
294
771000
4000
13:10
are the content on television that's driving the conversations.
295
775000
3000
13:13
And what we're seeing here, these skyscrapers now,
296
778000
3000
13:16
are commentary
297
781000
2000
13:18
that are linked to content on television.
298
783000
2000
13:20
Same concept,
299
785000
2000
13:22
but looking at communication dynamics
300
787000
2000
13:24
in a very different sphere.
301
789000
2000
13:26
And so fundamentally, rather than, for example,
302
791000
2000
13:28
measuring content based on how many people are watching,
303
793000
3000
13:31
this gives us the basic data
304
796000
2000
13:33
for looking at engagement properties of content.
305
798000
3000
13:36
And just like we can look at feedback cycles
306
801000
3000
13:39
and dynamics in a family,
307
804000
3000
13:42
we can now open up the same concepts
308
807000
3000
13:45
and look at much larger groups of people.
309
810000
3000
13:48
This is a subset of data from our database --
310
813000
3000
13:51
just 50,000 out of several million --
311
816000
3000
13:54
and the social graph that connects them
312
819000
2000
13:56
through publicly available sources.
313
821000
3000
13:59
And if you put them on one plain,
314
824000
2000
14:01
a second plain is where the content lives.
315
826000
3000
14:04
So we have the programs
316
829000
3000
14:07
and the sporting events
317
832000
2000
14:09
and the commercials,
318
834000
2000
14:11
and all of the link structures that tie them together
319
836000
2000
14:13
make a content graph.
320
838000
2000
14:15
And then the important third dimension.
321
840000
4000
14:19
Each of the links that you're seeing rendered here
322
844000
2000
14:21
is an actual connection made
323
846000
2000
14:23
between something someone said
324
848000
3000
14:26
and a piece of content.
325
851000
2000
14:28
And there are, again, now tens of millions of these links
326
853000
3000
14:31
that give us the connective tissue of social graphs
327
856000
3000
14:34
and how they relate to content.
328
859000
3000
14:37
And we can now start to probe the structure
329
862000
2000
14:39
in interesting ways.
330
864000
2000
14:41
So if we, for example, trace the path
331
866000
3000
14:44
of one piece of content
332
869000
2000
14:46
that drives someone to comment on it,
333
871000
2000
14:48
and then we follow where that comment goes,
334
873000
3000
14:51
and then look at the entire social graph that becomes activated
335
876000
3000
14:54
and then trace back to see the relationship
336
879000
3000
14:57
between that social graph and content,
337
882000
2000
14:59
a very interesting structure becomes visible.
338
884000
2000
15:01
We call this a co-viewing clique,
339
886000
2000
15:03
a virtual living room if you will.
340
888000
3000
15:06
And there are fascinating dynamics at play.
341
891000
2000
15:08
It's not one way.
342
893000
2000
15:10
A piece of content, an event, causes someone to talk.
343
895000
3000
15:13
They talk to other people.
344
898000
2000
15:15
That drives tune-in behavior back into mass media,
345
900000
3000
15:18
and you have these cycles
346
903000
2000
15:20
that drive the overall behavior.
347
905000
2000
15:22
Another example -- very different --
348
907000
2000
15:24
another actual person in our database --
349
909000
3000
15:27
and we're finding at least hundreds, if not thousands, of these.
350
912000
3000
15:30
We've given this person a name.
351
915000
2000
15:32
This is a pro-amateur, or pro-am media critic
352
917000
3000
15:35
who has this high fan-out rate.
353
920000
3000
15:38
So a lot of people are following this person -- very influential --
354
923000
3000
15:41
and they have a propensity to talk about what's on TV.
355
926000
2000
15:43
So this person is a key link
356
928000
3000
15:46
in connecting mass media and social media together.
357
931000
3000
15:49
One last example from this data:
358
934000
3000
15:52
Sometimes it's actually a piece of content that is special.
359
937000
3000
15:55
So if we go and look at this piece of content,
360
940000
4000
15:59
President Obama's State of the Union address
361
944000
3000
16:02
from just a few weeks ago,
362
947000
2000
16:04
and look at what we find in this same data set,
363
949000
3000
16:07
at the same scale,
364
952000
3000
16:10
the engagement properties of this piece of content
365
955000
2000
16:12
are truly remarkable.
366
957000
2000
16:14
A nation exploding in conversation
367
959000
2000
16:16
in real time
368
961000
2000
16:18
in response to what's on the broadcast.
369
963000
3000
16:21
And of course, through all of these lines
370
966000
2000
16:23
are flowing unstructured language.
371
968000
2000
16:25
We can X-ray
372
970000
2000
16:27
and get a real-time pulse of a nation,
373
972000
2000
16:29
real-time sense
374
974000
2000
16:31
of the social reactions in the different circuits in the social graph
375
976000
3000
16:34
being activated by content.
376
979000
3000
16:37
So, to summarize, the idea is this:
377
982000
3000
16:40
As our world becomes increasingly instrumented
378
985000
3000
16:43
and we have the capabilities
379
988000
2000
16:45
to collect and connect the dots
380
990000
2000
16:47
between what people are saying
381
992000
2000
16:49
and the context they're saying it in,
382
994000
2000
16:51
what's emerging is an ability
383
996000
2000
16:53
to see new social structures and dynamics
384
998000
3000
16:56
that have previously not been seen.
385
1001000
2000
16:58
It's like building a microscope or telescope
386
1003000
2000
17:00
and revealing new structures
387
1005000
2000
17:02
about our own behavior around communication.
388
1007000
3000
17:05
And I think the implications here are profound,
389
1010000
3000
17:08
whether it's for science,
390
1013000
2000
17:10
for commerce, for government,
391
1015000
2000
17:12
or perhaps most of all,
392
1017000
2000
17:14
for us as individuals.
393
1019000
3000
17:17
And so just to return to my son,
394
1022000
3000
17:20
when I was preparing this talk, he was looking over my shoulder,
395
1025000
3000
17:23
and I showed him the clips I was going to show to you today,
396
1028000
2000
17:25
and I asked him for permission -- granted.
397
1030000
3000
17:28
And then I went on to reflect,
398
1033000
2000
17:30
"Isn't it amazing,
399
1035000
3000
17:33
this entire database, all these recordings,
400
1038000
3000
17:36
I'm going to hand off to you and to your sister" --
401
1041000
2000
17:38
who arrived two years later --
402
1043000
3000
17:41
"and you guys are going to be able to go back and re-experience moments
403
1046000
3000
17:44
that you could never, with your biological memory,
404
1049000
3000
17:47
possibly remember the way you can now?"
405
1052000
2000
17:49
And he was quiet for a moment.
406
1054000
2000
17:51
And I thought, "What am I thinking?
407
1056000
2000
17:53
He's five years old. He's not going to understand this."
408
1058000
2000
17:55
And just as I was having that thought, he looked up at me and said,
409
1060000
3000
17:58
"So that when I grow up,
410
1063000
2000
18:00
I can show this to my kids?"
411
1065000
2000
18:02
And I thought, "Wow, this is powerful stuff."
412
1067000
3000
18:05
So I want to leave you
413
1070000
2000
18:07
with one last memorable moment
414
1072000
2000
18:09
from our family.
415
1074000
3000
18:12
This is the first time our son
416
1077000
2000
18:14
took more than two steps at once --
417
1079000
2000
18:16
captured on film.
418
1081000
2000
18:18
And I really want you to focus on something
419
1083000
3000
18:21
as I take you through.
420
1086000
2000
18:23
It's a cluttered environment; it's natural life.
421
1088000
2000
18:25
My mother's in the kitchen, cooking,
422
1090000
2000
18:27
and, of all places, in the hallway,
423
1092000
2000
18:29
I realize he's about to do it, about to take more than two steps.
424
1094000
3000
18:32
And so you hear me encouraging him,
425
1097000
2000
18:34
realizing what's happening,
426
1099000
2000
18:36
and then the magic happens.
427
1101000
2000
18:38
Listen very carefully.
428
1103000
2000
18:40
About three steps in,
429
1105000
2000
18:42
he realizes something magic is happening,
430
1107000
2000
18:44
and the most amazing feedback loop of all kicks in,
431
1109000
3000
18:47
and he takes a breath in,
432
1112000
2000
18:49
and he whispers "wow"
433
1114000
2000
18:51
and instinctively I echo back the same.
434
1116000
4000
18:56
And so let's fly back in time
435
1121000
3000
18:59
to that memorable moment.
436
1124000
2000
19:05
(Video) DR: Hey.
437
1130000
2000
19:07
Come here.
438
1132000
2000
19:09
Can you do it?
439
1134000
3000
19:13
Oh, boy.
440
1138000
2000
19:15
Can you do it?
441
1140000
3000
19:18
Baby: Yeah.
442
1143000
2000
19:20
DR: Ma, he's walking.
443
1145000
3000
19:24
(Laughter)
444
1149000
2000
19:26
(Applause)
445
1151000
2000
19:28
DR: Thank you.
446
1153000
2000
19:30
(Applause)
447
1155000
15000

▲Back to top

ABOUT THE SPEAKER
Deb Roy - Cognitive scientist
Deb Roy studies how children learn language, and designs machines that learn to communicate in human-like ways. On sabbatical from MIT Media Lab, he's working with the AI company Bluefin Labs.

Why you should listen

Deb Roy directs the Cognitive Machines group at the MIT Media Lab, where he studies how children learn language, and designs machines that learn to communicate in human-like ways. To enable this work, he has pioneered new data-driven methods for analyzing and modeling human linguistic and social behavior. He has authored numerous scientific papers on artificial intelligence, cognitive modeling, human-machine interaction, data mining, and information visualization.

Deb Roy was the co-founder and serves as CEO of Bluefin Labs, a venture-backed technology company. Built upon deep machine learning principles developed in his research over the past 15 years, Bluefin has created a technology platform that analyzes social media commentary to measure real-time audience response to TV ads and shows.

Follow Deb Roy on Twitter>

Roy adds some relevant papers:

Deb Roy. (2009). New Horizons in the Study of Child Language Acquisition. Proceedings of Interspeech 2009. Brighton, England. bit.ly/fSP4Qh

Brandon C. Roy, Michael C. Frank and Deb Roy. (2009). Exploring word learning in a high-density longitudinal corpus. Proceedings of the 31st Annual Meeting of the Cognitive Science Society. Amsterdam, Netherlands. bit.ly/e1qxej

Plenty more papers on our research including technology and methodology can be found here, together with other research from my lab at MIT: bit.ly/h3paSQ

The work that I mentioned on relationships between television content and the social graph is being done at Bluefin Labs (www.bluefinlabs.com). Details of this work have not been published. The social structures we are finding (and that I highlighted in my TED talk) are indeed new. The social media communication channels that are leading to their formation did not even exist a few years ago, and Bluefin's technology platform for discovering these kinds of structures is the first of its kind. We'll certainly have more to say about all this as we continue to dig into this fascinating new kind of data, and as new social structures continue to evolve!

More profile about the speaker
Deb Roy | Speaker | TED.com

Data provided by TED.

This site was created in May 2015 and the last update was on January 12, 2020. It will no longer be updated.

We are currently creating a new site called "eng.lish.video" and would be grateful if you could access it.

If you have any questions or suggestions, please feel free to write comments in your language on the contact form.

Privacy Policy

Developer's Blog

Buy Me A Coffee