ABOUT THE SPEAKER
Deb Roy - Cognitive scientist
Deb Roy studies how children learn language, and designs machines that learn to communicate in human-like ways. On sabbatical from MIT Media Lab, he's working with the AI company Bluefin Labs.

Why you should listen

Deb Roy directs the Cognitive Machines group at the MIT Media Lab, where he studies how children learn language, and designs machines that learn to communicate in human-like ways. To enable this work, he has pioneered new data-driven methods for analyzing and modeling human linguistic and social behavior. He has authored numerous scientific papers on artificial intelligence, cognitive modeling, human-machine interaction, data mining, and information visualization.

Deb Roy was the co-founder and serves as CEO of Bluefin Labs, a venture-backed technology company. Built upon deep machine learning principles developed in his research over the past 15 years, Bluefin has created a technology platform that analyzes social media commentary to measure real-time audience response to TV ads and shows.

Follow Deb Roy on Twitter>

Roy adds some relevant papers:

Deb Roy. (2009). New Horizons in the Study of Child Language Acquisition. Proceedings of Interspeech 2009. Brighton, England. bit.ly/fSP4Qh

Brandon C. Roy, Michael C. Frank and Deb Roy. (2009). Exploring word learning in a high-density longitudinal corpus. Proceedings of the 31st Annual Meeting of the Cognitive Science Society. Amsterdam, Netherlands. bit.ly/e1qxej

Plenty more papers on our research including technology and methodology can be found here, together with other research from my lab at MIT: bit.ly/h3paSQ

The work that I mentioned on relationships between television content and the social graph is being done at Bluefin Labs (www.bluefinlabs.com). Details of this work have not been published. The social structures we are finding (and that I highlighted in my TED talk) are indeed new. The social media communication channels that are leading to their formation did not even exist a few years ago, and Bluefin's technology platform for discovering these kinds of structures is the first of its kind. We'll certainly have more to say about all this as we continue to dig into this fascinating new kind of data, and as new social structures continue to evolve!

More profile about the speaker
Deb Roy | Speaker | TED.com
TED2011

Deb Roy: The birth of a word

Deb Roy: o nacemento dunha palabra

Filmed:
2,809,941 views

O investigador do MIT, Deb Roy, quería entender como aprendía a falar o seu fillo. Entón, cableou a súa casa con cámaras para capturar cada momento (con excepcións) da vida do seu fillo. Despois, analizou 90.000 horas de vídeo doméstico para ver como "gaaa" se transformaba lentamente en "auga" (water). Unha investigación sorprendente, cunha gran cantidade de datos e profundas consecuencias para o camiño da aprendizaxe.
- Cognitive scientist
Deb Roy studies how children learn language, and designs machines that learn to communicate in human-like ways. On sabbatical from MIT Media Lab, he's working with the AI company Bluefin Labs. Full bio

Double-click the English transcript below to play the video.

00:15
Imagine if you could record your life --
0
0
4000
Imaxinade que puidésedes rexistrar as vosas vidas...
00:19
everything you said, everything you did,
1
4000
3000
todo o que dixestes, todo o que fixestes,
00:22
available in a perfect memory store at your fingertips,
2
7000
3000
ó alcance da man nunha mediateca perfecta
00:25
so you could go back
3
10000
2000
para poder volver
00:27
and find memorable moments and relive them,
4
12000
3000
á procura de momentos memorables e revivilos
00:30
or sift through traces of time
5
15000
3000
ou examinar marcas do tempo
00:33
and discover patterns in your own life
6
18000
2000
e descubrir patróns nas vosas propias vidas
00:35
that previously had gone undiscovered.
7
20000
3000
previamente inadvertidos.
00:38
Well that's exactly the journey
8
23000
2000
Pois esa é exactamente a viaxe
00:40
that my family began
9
25000
2000
que emprendeu a miña familia
00:42
five and a half years ago.
10
27000
2000
fai cinco anos e medio.
00:44
This is my wife and collaborator, Rupal.
11
29000
3000
Esta é Rupal, a miña muller e colaboradora.
00:47
And on this day, at this moment,
12
32000
2000
E neste día, neste momento,
00:49
we walked into the house with our first child,
13
34000
2000
entramos na casa co noso primeiro fillo,
00:51
our beautiful baby boy.
14
36000
2000
o noso fermoso bebé.
00:53
And we walked into a house
15
38000
3000
E entramos nunha casa
00:56
with a very special home video recording system.
16
41000
4000
cun sistema de gravación de vídeo moi especial.
01:07
(Video) Man: Okay.
17
52000
2000
(Vídeo) Home: Moi ben.
01:10
Deb Roy: This moment
18
55000
1000
Deb Roy: Este momento,
01:11
and thousands of other moments special for us
19
56000
3000
e milleiros doutros momentos especiais para nós,
01:14
were captured in our home
20
59000
2000
foron capturados na casa
01:16
because in every room in the house,
21
61000
2000
porque en tódalas habitacións da casa
01:18
if you looked up, you'd see a camera and a microphone,
22
63000
3000
se mirases arriba, verías unha cámara e un micrófono
01:21
and if you looked down,
23
66000
2000
e se mirases abaixo
01:23
you'd get this bird's-eye view of the room.
24
68000
2000
terías unha vista de paxaro da habitación.
01:25
Here's our living room,
25
70000
3000
Aquí está a nosa sala de estar,
01:28
the baby bedroom,
26
73000
3000
o dormitorio do meniño,
01:31
kitchen, dining room
27
76000
2000
a cociña, o comedor
01:33
and the rest of the house.
28
78000
2000
e o resto da casa.
01:35
And all of these fed into a disc array
29
80000
3000
E todo isto alimentaba un conxunto de discos
01:38
that was designed for a continuous capture.
30
83000
3000
deseñados para estar continuamente a capturar.
01:41
So here we are flying through a day in our home
31
86000
3000
Así que aquí estamos, a voar a través dun día na nosa casa,
01:44
as we move from sunlit morning
32
89000
3000
desde o sol da mañá,
01:47
through incandescent evening
33
92000
2000
pasando pola tarde incandescente,
01:49
and, finally, lights out for the day.
34
94000
3000
e, finalmente, as luces do día se apagan.
01:53
Over the course of three years,
35
98000
3000
Durante tres anos,
01:56
we recorded eight to 10 hours a day,
36
101000
2000
rexistramos de oito a dez horas ó día,
01:58
amassing roughly a quarter-million hours
37
103000
3000
acumulando un cuarto de millón de horas
02:01
of multi-track audio and video.
38
106000
3000
de audio e vídeo multi-pista.
02:04
So you're looking at a piece of what is by far
39
109000
2000
Así que estades a ver un anaco de que é, de lonxe,
02:06
the largest home video collection ever made.
40
111000
2000
a maior colección de vídeo caseiro até o momento.
02:08
(Laughter)
41
113000
3000
(Risas)
02:11
And what this data represents
42
116000
2000
E o que estes datos representan
02:13
for our family at a personal level,
43
118000
4000
para a nosa familia a nivel persoal,
02:17
the impact has already been immense,
44
122000
2000
o impacto xa foi enorme,
02:19
and we're still learning its value.
45
124000
3000
e aínda estamos a apreciar o seu valor.
02:22
Countless moments
46
127000
2000
Innumerables momentos
02:24
of unsolicited natural moments, not posed moments,
47
129000
3000
de natureza espontánea, non preparados,
02:27
are captured there,
48
132000
2000
foron capturados,
02:29
and we're starting to learn how to discover them and find them.
49
134000
3000
e aínda estamos comezando a aprender como descubrirlos e atopalos.
02:32
But there's also a scientific reason that drove this project,
50
137000
3000
Pero hai tamén un motivo científico que impulsou este proxecto,
02:35
which was to use this natural longitudinal data
51
140000
4000
que foi o uso destes datos naturais lonxitudinais
02:39
to understand the process
52
144000
2000
para entende-lo proceso
02:41
of how a child learns language --
53
146000
2000
de como un neno aprende a lingua,
02:43
that child being my son.
54
148000
2000
sendo ese neno o meu fillo.
02:45
And so with many privacy provisions put in place
55
150000
4000
E con moitas provisións de privacidade
02:49
to protect everyone who was recorded in the data,
56
154000
3000
para protexer a todas as persoas rexistradas nos datos,
02:52
we made elements of the data available
57
157000
3000
facilitamos elementos dos datos
02:55
to my trusted research team at MIT
58
160000
3000
ó meu equipo de investigación de confianza do MIT
02:58
so we could start teasing apart patterns
59
163000
3000
para poder comezar a identificar patróns
03:01
in this massive data set,
60
166000
3000
neste conxunto masivo de datos,
03:04
trying to understand the influence of social environments
61
169000
3000
intentando entender a influencia dos medios sociais
03:07
on language acquisition.
62
172000
2000
na adquisición da linguaxe.
03:09
So we're looking here
63
174000
2000
Así que estamos vendo
03:11
at one of the first things we started to do.
64
176000
2000
unha das primeiras cousas que comezamos a facer.
03:13
This is my wife and I cooking breakfast in the kitchen,
65
178000
4000
Estes somos a miña muller e máis eu preparando o almorzo na cociña.
03:17
and as we move through space and through time,
66
182000
3000
E mentres nos movemos no espazo en no tempo,
03:20
a very everyday pattern of life in the kitchen.
67
185000
3000
un patrón cotián da vida na cociña.
03:23
In order to convert
68
188000
2000
Para converter
03:25
this opaque, 90,000 hours of video
69
190000
3000
estas 90.000 horas de vídeo opaco
03:28
into something that we could start to see,
70
193000
2000
en algo que puidésemos comezar a ver,
03:30
we use motion analysis to pull out,
71
195000
2000
usamos análise de movemento para extraer,
03:32
as we move through space and through time,
72
197000
2000
mentres nos movemos no espazo e no tempo,
03:34
what we call space-time worms.
73
199000
3000
o que chamamos vermes espazo-tempo.
03:37
And this has become part of our toolkit
74
202000
3000
E isto converteuse en parte das nosas ferramentas
03:40
for being able to look and see
75
205000
3000
para conseguir ver e mirar
03:43
where the activities are in the data,
76
208000
2000
onde é que están as actividades nos datos,
03:45
and with it, trace the pattern of, in particular,
77
210000
3000
e con isto, trazar o patrón, en particular,
03:48
where my son moved throughout the home,
78
213000
2000
de onde é que o meu fillo se move pola casa,
03:50
so that we could focus our transcription efforts,
79
215000
3000
para poder enfocar os nosos esforzos de transcrición,
03:53
all of the speech environment around my son --
80
218000
3000
no entorno da linguaxe arredor do meu fillo --
03:56
all of the words that he heard from myself, my wife, our nanny,
81
221000
3000
todas as palabras que ouviu de min, da miña muller, da nosa neneira,
03:59
and over time, the words he began to produce.
82
224000
3000
e co tempo, as palabras que comezou a producir.
04:02
So with that technology and that data
83
227000
3000
Así que con toda esa tecnoloxía e datos
04:05
and the ability to, with machine assistance,
84
230000
2000
e a posibilidade de, con asistencia técnica,
04:07
transcribe speech,
85
232000
2000
transcribir as palabras,
04:09
we've now transcribed
86
234000
2000
levamos transcritas
04:11
well over seven million words of our home transcripts.
87
236000
3000
máis de sete millóns de palabras das nosas transcricións caseiras.
04:14
And with that, let me take you now
88
239000
2000
E con iso, deixádeme levarvos agora
04:16
for a first tour into the data.
89
241000
3000
a dar unha primeira volta polos datos.
04:19
So you've all, I'm sure,
90
244000
2000
Seguramente, todos vós
04:21
seen time-lapse videos
91
246000
2000
vistes vídeos acelerados
04:23
where a flower will blossom as you accelerate time.
92
248000
3000
onde unha flor se abre de forma acelerada.
04:26
I'd like you to now experience
93
251000
2000
Gustaríame que agora experimentarades
04:28
the blossoming of a speech form.
94
253000
2000
o florecemento dunha forma de fala.
04:30
My son, soon after his first birthday,
95
255000
2000
O meu fillo, xusto despois do seu primeiro aniversario,
04:32
would say "gaga" to mean water.
96
257000
3000
dicía "gaga" no canto de auga (water).
04:35
And over the course of the next half-year,
97
260000
3000
E durante o seu seguinte medio ano,
04:38
he slowly learned to approximate
98
263000
2000
lentamente aprendeu a aproximarse
04:40
the proper adult form, "water."
99
265000
3000
á forma adulta correcta, "auga" ("water").
04:43
So we're going to cruise through half a year
100
268000
2000
Así que imos navegar por medio ano
04:45
in about 40 seconds.
101
270000
2000
nuns 40 segundos.
04:47
No video here,
102
272000
2000
Aquí sen vídeo,
04:49
so you can focus on the sound, the acoustics,
103
274000
3000
de forma que poidades enfocarvos no son, na acústica,
04:52
of a new kind of trajectory:
104
277000
2000
dun novo tipo de traxectoria:
04:54
gaga to water.
105
279000
2000
de gaga a auga (water).
04:56
(Audio) Baby: Gagagagagaga
106
281000
12000
(Audio) Bebé: Gagagagagaga
05:08
Gaga gaga gaga
107
293000
4000
Gaga gaga gaga
05:12
guga guga guga
108
297000
5000
guga guga guga
05:17
wada gaga gaga guga gaga
109
302000
5000
wada gaga gaga guga gaga
05:22
wader guga guga
110
307000
4000
wader guga guga
05:26
water water water
111
311000
3000
auga auga auga (water)
05:29
water water water
112
314000
6000
auga auga auga (water)
05:35
water water
113
320000
4000
auga auga (water)
05:39
water.
114
324000
2000
auga.
05:41
DR: He sure nailed it, didn't he.
115
326000
2000
DR: Cravouna, non é?
05:43
(Applause)
116
328000
7000
(Aplauso)
05:50
So he didn't just learn water.
117
335000
2000
Pero non só aprendeu auga.
05:52
Over the course of the 24 months,
118
337000
2000
Durante os 24 meses,
05:54
the first two years that we really focused on,
119
339000
3000
os primeiros dous anos, nos que realmente nos centramos,
05:57
this is a map of every word he learned in chronological order.
120
342000
4000
este é un mapa de cada palabra que aprendeu en orde cronolóxica.
06:01
And because we have full transcripts,
121
346000
3000
E como temos todas as transcricións,
06:04
we've identified each of the 503 words
122
349000
2000
identificamos cada unha das 503 palabras
06:06
that he learned to produce by his second birthday.
123
351000
2000
que aprendeu a producir até o seu segundo aniversario.
06:08
He was an early talker.
124
353000
2000
Comezou a falar cedo.
06:10
And so we started to analyze why.
125
355000
3000
E comezamos a analizar por que.
06:13
Why were certain words born before others?
126
358000
3000
Por que certas palabras naceron antes ca outras?
06:16
This is one of the first results
127
361000
2000
Este é un dous primeiros resultados
06:18
that came out of our study a little over a year ago
128
363000
2000
que saíron do estudo fai pouco máis dun ano
06:20
that really surprised us.
129
365000
2000
que realmente nos sorprendeu.
06:22
The way to interpret this apparently simple graph
130
367000
3000
A forma de interpretar este gráfico aparentemente simple
06:25
is, on the vertical is an indication
131
370000
2000
é ver na vertical unha indicación
06:27
of how complex caregiver utterances are
132
372000
3000
do complexos que son os enunciados do coidador
06:30
based on the length of utterances.
133
375000
2000
baseada na lonxitude dos enunciados.
06:32
And the [horizontal] axis is time.
134
377000
3000
E o eixo vertical é o tempo.
06:35
And all of the data,
135
380000
2000
E tódolos datos
06:37
we aligned based on the following idea:
136
382000
3000
foron aliñados baseándonos na seguinte idea:
06:40
Every time my son would learn a word,
137
385000
3000
Cada vez que o meu fillo aprendese unha palabra,
06:43
we would trace back and look at all of the language he heard
138
388000
3000
iriamos cara a atrás e veriamos toda a linguaxe que ouviu
06:46
that contained that word.
139
391000
2000
que contiña esa palabra.
06:48
And we would plot the relative length of the utterances.
140
393000
4000
E determinariamos a lonxitude relativa dos enunciados.
06:52
And what we found was this curious phenomena,
141
397000
3000
E o que encontramos foi este curioso fenómeno,
06:55
that caregiver speech would systematically dip to a minimum,
142
400000
3000
que o discurso do coidador iría sistematicamente reducíndose ó mínimo,
06:58
making language as simple as possible,
143
403000
3000
facendo a linguaxe tan simple como fose posible,
07:01
and then slowly ascend back up in complexity.
144
406000
3000
e lentamente ascende cara atrás en complexidade.
07:04
And the amazing thing was
145
409000
2000
E o sorprendente foi
07:06
that bounce, that dip,
146
411000
2000
que ese salto, esa redución,
07:08
lined up almost precisely
147
413000
2000
aliñábase de forma moi precisa
07:10
with when each word was born --
148
415000
2000
co tempo en que naceu cada palabra --
07:12
word after word, systematically.
149
417000
2000
palabra tras palabra, sistematicamente.
07:14
So it appears that all three primary caregivers --
150
419000
2000
Así que parece que os tres coidadores primarios --
07:16
myself, my wife and our nanny --
151
421000
3000
a miña muller, a nosa neneira e máis eu,
07:19
were systematically and, I would think, subconsciously
152
424000
3000
fomos sistematicamente e, penso eu, inconscientemente
07:22
restructuring our language
153
427000
2000
restruturando a nosa linguaxe
07:24
to meet him at the birth of a word
154
429000
3000
até coincidir con el no nacemento dunha palabra
07:27
and bring him gently into more complex language.
155
432000
4000
e introducilo pouco a pouco nunha linguaxe máis complexa.
07:31
And the implications of this -- there are many,
156
436000
2000
E as implicacións disto -- hai moitas,
07:33
but one I just want to point out,
157
438000
2000
pero hai unha que quero resaltar,
07:35
is that there must be amazing feedback loops.
158
440000
3000
é que parece que hai sorprendentes bucles de resposta.
07:38
Of course, my son is learning
159
443000
2000
Por suposto, o meu fillo está a aprender
07:40
from his linguistic environment,
160
445000
2000
do seu entorno lingüístico,
07:42
but the environment is learning from him.
161
447000
3000
pero o entorno está a aprender del.
07:45
That environment, people, are in these tight feedback loops
162
450000
3000
Esa xente do entorno, están nestes apertados bucles de resposta
07:48
and creating a kind of scaffolding
163
453000
2000
e crean unha especie de estada
07:50
that has not been noticed until now.
164
455000
3000
que non fora descuberta até agora.
07:54
But that's looking at the speech context.
165
459000
2000
Pero iso é mirando o contexto da linguaxe.
07:56
What about the visual context?
166
461000
2000
Que pasa co contexto visual?
07:58
We're not looking at --
167
463000
2000
O que estamos a ver agora --
08:00
think of this as a dollhouse cutaway of our house.
168
465000
2000
pensade nisto como se a nosa casa fose unha casa de bonecas.
08:02
We've taken those circular fish-eye lens cameras,
169
467000
3000
Collemos esas cámaras de lentes 'ollo de peixe' circulares
08:05
and we've done some optical correction,
170
470000
2000
e fixemos algunhas correccións ópticas
08:07
and then we can bring it into three-dimensional life.
171
472000
4000
para poder darlle unha vida tridimensional.
08:11
So welcome to my home.
172
476000
2000
Así que, benvidos á miña casa.
08:13
This is a moment,
173
478000
2000
Este é un momento,
08:15
one moment captured across multiple cameras.
174
480000
3000
un momento capturado por múltiples cámaras.
08:18
The reason we did this is to create the ultimate memory machine,
175
483000
3000
A razón pola que fixemos isto é crear a máquina de memoria definitiva,
08:21
where you can go back and interactively fly around
176
486000
3000
onde podes volver atrás interactivamente e dar voltas
08:24
and then breathe video-life into this system.
177
489000
3000
e entón darlle vida ó vídeo neste sistema.
08:27
What I'm going to do
178
492000
2000
O que vou facer
08:29
is give you an accelerated view of 30 minutes,
179
494000
3000
é darvos unha vista acelerada de 30 minutos,
08:32
again, of just life in the living room.
180
497000
2000
outra vez, só da vida na sala de estar.
08:34
That's me and my son on the floor.
181
499000
3000
Eses somos eu e o meu fillo no chan.
08:37
And there's video analytics
182
502000
2000
E a análise visual
08:39
that are tracking our movements.
183
504000
2000
que está a capturar os nosos movementos.
08:41
My son is leaving red ink. I am leaving green ink.
184
506000
3000
O meu fillo está a deixar tinta vermella, eu estou a deixar tinta verde.
08:44
We're now on the couch,
185
509000
2000
Agora estamos no sofá,
08:46
looking out through the window at cars passing by.
186
511000
3000
mirando pola fiestra como pasan os coches.
08:49
And finally, my son playing in a walking toy by himself.
187
514000
3000
E finalmente, o meu fillo xogando el só.
08:52
Now we freeze the action, 30 minutes,
188
517000
3000
Agora conxelamos a acción, 30 minutos,
08:55
we turn time into the vertical axis,
189
520000
2000
poñemos o tempo no eixo vertical,
08:57
and we open up for a view
190
522000
2000
e abrimos a vista
08:59
of these interaction traces we've just left behind.
191
524000
3000
destas marcas de interacción que fomos deixando atrás.
09:02
And we see these amazing structures --
192
527000
3000
E vemos estas sorprendentes estruturas --
09:05
these little knots of two colors of thread
193
530000
3000
estes pequenos nós de fío de dúas cores
09:08
we call "social hot spots."
194
533000
2000
chamámoslles puntos quentes sociais.
09:10
The spiral thread
195
535000
2000
Ó fío en espiral
09:12
we call a "solo hot spot."
196
537000
2000
chamámoslle punto quente individual.
09:14
And we think that these affect the way language is learned.
197
539000
3000
E cremos que afectan a como se aprende a linguaxe.
09:17
What we'd like to do
198
542000
2000
O que nos gustaría facer
09:19
is start understanding
199
544000
2000
é comezar a comprender
09:21
the interaction between these patterns
200
546000
2000
a interacción entre estes patróns
09:23
and the language that my son is exposed to
201
548000
2000
e a linguaxe á que está exposto o meu fillo
09:25
to see if we can predict
202
550000
2000
para ver se podemos predicir
09:27
how the structure of when words are heard
203
552000
2000
como a estrutura de cando se ouven as palabras
09:29
affects when they're learned --
204
554000
2000
inflúe cando son aprendidas --
09:31
so in other words, the relationship
205
556000
2000
Ou o que é o mesmo, a relación
09:33
between words and what they're about in the world.
206
558000
4000
entre palabras e sobre o que tratan no mundo.
09:37
So here's how we're approaching this.
207
562000
2000
Así é como estamos a enfocar isto.
09:39
In this video,
208
564000
2000
Neste vídeo,
09:41
again, my son is being traced out.
209
566000
2000
novamente, o meu fillo está a ser seguido.
09:43
He's leaving red ink behind.
210
568000
2000
Está a deixar un rastro de tinta vermella.
09:45
And there's our nanny by the door.
211
570000
2000
E aquí está a nosa neneira na porta.
09:47
(Video) Nanny: You want water? (Baby: Aaaa.)
212
572000
3000
(Vídeo) Neneira: Queres auga? (Bebé: Aaaa)
09:50
Nanny: All right. (Baby: Aaaa.)
213
575000
3000
Neneira: De acordo. (Bebé: Aaaa.)
09:53
DR: She offers water,
214
578000
2000
DR: Ofrécelle auga,
09:55
and off go the two worms
215
580000
2000
e aí están os dous vermes
09:57
over to the kitchen to get water.
216
582000
2000
pola cociña á procura de auga.
09:59
And what we've done is use the word "water"
217
584000
2000
E o que temos feito é usar a palabra "auga" (water)
10:01
to tag that moment, that bit of activity.
218
586000
2000
para etiquetar ese momento, ese anaco de actividade.
10:03
And now we take the power of data
219
588000
2000
E agora collemos o poder dos datos
10:05
and take every time my son
220
590000
3000
e collemos cada momento no que o meu fillo
10:08
ever heard the word water
221
593000
2000
algunha vez ouviu a palabra auga (water)
10:10
and the context he saw it in,
222
595000
2000
e o contexto no que a viu,
10:12
and we use it to penetrate through the video
223
597000
3000
e o usamos para penetrar no vídeo
10:15
and find every activity trace
224
600000
3000
e atopar cada traza de actividade
10:18
that co-occurred with an instance of water.
225
603000
3000
que aconteceu no mesmo tempo da instancia de "auga"
10:21
And what this data leaves in its wake
226
606000
2000
E o que estes datos deixan
10:23
is a landscape.
227
608000
2000
é unha paisaxe
10:25
We call these wordscapes.
228
610000
2000
Nos chamámoslles palabraxes (wordscapes)
10:27
This is the wordscape for the word water,
229
612000
2000
Esta é a palabraxe da palabra auga,
10:29
and you can see most of the action is in the kitchen.
230
614000
2000
e poden ver que a maioría da acción está na cociña
10:31
That's where those big peaks are over to the left.
231
616000
3000
Que é onde están eses picos grandes da esquerda
10:34
And just for contrast, we can do this with any word.
232
619000
3000
E só para contrastar, podemos facer isto con calquera palabra.
10:37
We can take the word "bye"
233
622000
2000
Podemos coller a palabra "adeus" (bye)
10:39
as in "good bye."
234
624000
2000
como na frase de despedida (good bye)
10:41
And we're now zoomed in over the entrance to the house.
235
626000
2000
E estamos agora facendo zoom na entrada da casa.
10:43
And we look, and we find, as you would expect,
236
628000
3000
E miramos, e atopamos, como poderiades esperar,
10:46
a contrast in the landscape
237
631000
2000
un contraste na paisaxe
10:48
where the word "bye" occurs much more in a structured way.
238
633000
3000
onde a palabra "adeus" aparece dunha forma moito máis estruturada.
10:51
So we're using these structures
239
636000
2000
Así que usamos estas estruturas
10:53
to start predicting
240
638000
2000
para comezar a predicir
10:55
the order of language acquisition,
241
640000
3000
a orde de adquisición da linguaxe,
10:58
and that's ongoing work now.
242
643000
2000
e iso é no que estamos a traballar agora.
11:00
In my lab, which we're peering into now, at MIT --
243
645000
3000
No meu laboratorio, que estamos a ver agora, no MIT --
11:03
this is at the media lab.
244
648000
2000
este é o laboratorio de medios.
11:05
This has become my favorite way
245
650000
2000
Esta é a miña forma favorita
11:07
of videographing just about any space.
246
652000
2000
de videografar sobre calquera espazo.
11:09
Three of the key people in this project,
247
654000
2000
Tres das persoas clave neste proxecto,
11:11
Philip DeCamp, Rony Kubat and Brandon Roy are pictured here.
248
656000
3000
Philip DeCamp, Rony Kubat y Brandon Roy están nesta foto.
11:14
Philip has been a close collaborator
249
659000
2000
Philip foi un colaborador próximo
11:16
on all the visualizations you're seeing.
250
661000
2000
en tódalas visualizacións que estades a ver.
11:18
And Michael Fleischman
251
663000
3000
E Michael Fleischman
11:21
was another Ph.D. student in my lab
252
666000
2000
foi outro estudante de doutorado no meu laboratorio
11:23
who worked with me on this home video analysis,
253
668000
3000
que traballou comigo nesta análise do vídeo doméstico
11:26
and he made the following observation:
254
671000
3000
e fixo a seguinte observación:
11:29
that "just the way that we're analyzing
255
674000
2000
que "a forma na que estamos analizando
11:31
how language connects to events
256
676000
3000
como a linguaxe conecta cos eventos
11:34
which provide common ground for language,
257
679000
2000
que proporcionan un marco común para a linguaxe,
11:36
that same idea we can take out of your home, Deb,
258
681000
4000
a mesma idea pode ser sacada da túa casa, Deb,
11:40
and we can apply it to the world of public media."
259
685000
3000
e podemos aplicala ó mundo dos medios de comunicación."
11:43
And so our effort took an unexpected turn.
260
688000
3000
E entón, o noso esforzo deu un xiro inesperado.
11:46
Think of mass media
261
691000
2000
Pensade nos medios de comunicación de masas
11:48
as providing common ground
262
693000
2000
como provedores dun marco común
11:50
and you have the recipe
263
695000
2000
e teredes a receita
11:52
for taking this idea to a whole new place.
264
697000
3000
para levar esta idea a un lugar completamente novo.
11:55
We've started analyzing television content
265
700000
3000
Comezamos a analizar contido televisivo
11:58
using the same principles --
266
703000
2000
usando os mesmos principios --
12:00
analyzing event structure of a TV signal --
267
705000
3000
analizando a estrutura dos eventos do sinal da tele --
12:03
episodes of shows,
268
708000
2000
capítulos de series,
12:05
commercials,
269
710000
2000
publicidade,
12:07
all of the components that make up the event structure.
270
712000
3000
tódolos compoñentes que conforman a estrutura dos eventos.
12:10
And we're now, with satellite dishes, pulling and analyzing
271
715000
3000
E estamos agora, con antenas parabólicas, extraendo e analizando
12:13
a good part of all the TV being watched in the United States.
272
718000
3000
unha boa parte de toda a tele que se ve nos Estados Unidos.
12:16
And you don't have to now go and instrument living rooms with microphones
273
721000
3000
E non tes que instrumentar salas de estar con micrófonos
12:19
to get people's conversations,
274
724000
2000
para conseguir as conversacións das persoas,
12:21
you just tune into publicly available social media feeds.
275
726000
3000
só tes que sintonizar os sinais dispoñibles dos medios sociais.
12:24
So we're pulling in
276
729000
2000
Así que estamos a extraer
12:26
about three billion comments a month,
277
731000
2000
cerca de 3.000 millóns de comentarios ó mes.
12:28
and then the magic happens.
278
733000
2000
E entón comeza a maxia.
12:30
You have the event structure,
279
735000
2000
Tes a estrutura do evento,
12:32
the common ground that the words are about,
280
737000
2000
o marco común do que tratan as palabras,
12:34
coming out of the television feeds;
281
739000
3000
saíndo dos sinais da televisión;
12:37
you've got the conversations
282
742000
2000
tes as conversas
12:39
that are about those topics;
283
744000
2000
que tratan sobre eses temas;
12:41
and through semantic analysis --
284
746000
3000
e a través de análise semántica --
12:44
and this is actually real data you're looking at
285
749000
2000
e isto que estades a ver son de verdade datos reais
12:46
from our data processing --
286
751000
2000
do noso procesamento de datos --
12:48
each yellow line is showing a link being made
287
753000
3000
cada liña amarela mostra un enlace feito
12:51
between a comment in the wild
288
756000
3000
entre un comentario
12:54
and a piece of event structure coming out of the television signal.
289
759000
3000
e unha peza da estrutura de evento saíndo do sinal da televisión.
12:57
And the same idea now
290
762000
2000
E a mesma idea
12:59
can be built up.
291
764000
2000
pódese construír.
13:01
And we get this wordscape,
292
766000
2000
E obtemos esta palabraxe,
13:03
except now words are not assembled in my living room.
293
768000
3000
só que agora as palabras non son ensambladas na miña sala de estar.
13:06
Instead, the context, the common ground activities,
294
771000
4000
No canto, o contexto, as actividades do marco común,
13:10
are the content on television that's driving the conversations.
295
775000
3000
son o contido da televisión que está a conducir as conversacións.
13:13
And what we're seeing here, these skyscrapers now,
296
778000
3000
E o que estamos a ver aquí, estes rañaceos,
13:16
are commentary
297
781000
2000
son comentarios
13:18
that are linked to content on television.
298
783000
2000
enlazados ó contido da televisión.
13:20
Same concept,
299
785000
2000
O mesmo concepto,
13:22
but looking at communication dynamics
300
787000
2000
pero aplicado á dinámica da comunicación
13:24
in a very different sphere.
301
789000
2000
nunha esfera moi diferente.
13:26
And so fundamentally, rather than, for example,
302
791000
2000
E de forma tan fundamental, en lugar de, por exemplo,
13:28
measuring content based on how many people are watching,
303
793000
3000
medir o contido baseándonos en canta xente o está a ver,
13:31
this gives us the basic data
304
796000
2000
isto proporciónanos datos básicos
13:33
for looking at engagement properties of content.
305
798000
3000
para ver a capacidade de atracción do contido.
13:36
And just like we can look at feedback cycles
306
801000
3000
E da mesma forma que podemos ver os ciclos de resposta
13:39
and dynamics in a family,
307
804000
3000
e dinámicas nunha familia,
13:42
we can now open up the same concepts
308
807000
3000
podemos expandir os mesmos conceptos
13:45
and look at much larger groups of people.
309
810000
3000
e observar grupos de xente moito máis grandes.
13:48
This is a subset of data from our database --
310
813000
3000
Este é un subconxunto de datos da nosa base de datos --
13:51
just 50,000 out of several million --
311
816000
3000
só 50.000 dos varios millóns --
13:54
and the social graph that connects them
312
819000
2000
e o grafo social que os conecta
13:56
through publicly available sources.
313
821000
3000
a través de fontes dispoñibles publicamente.
13:59
And if you put them on one plain,
314
824000
2000
E se os poñemos nun único plano,
14:01
a second plain is where the content lives.
315
826000
3000
o segundo plano é onde vive o contido.
14:04
So we have the programs
316
829000
3000
Así que temos os programas
14:07
and the sporting events
317
832000
2000
e os eventos deportivos
14:09
and the commercials,
318
834000
2000
e os anuncios,
14:11
and all of the link structures that tie them together
319
836000
2000
e tódalas estruturas de enlaces que os manteñen xuntos
14:13
make a content graph.
320
838000
2000
que fan unha gráfica de contido.
14:15
And then the important third dimension.
321
840000
4000
E entón temos a importante terceira dimensión.
14:19
Each of the links that you're seeing rendered here
322
844000
2000
Cada un dos enlaces que vedes xerados aquí
14:21
is an actual connection made
323
846000
2000
son unha conexión feita
14:23
between something someone said
324
848000
3000
entre algo que alguén dixo
14:26
and a piece of content.
325
851000
2000
e unha peza de contido.
14:28
And there are, again, now tens of millions of these links
326
853000
3000
E aquí están de novo, decenas de millóns destes enlaces
14:31
that give us the connective tissue of social graphs
327
856000
3000
que nos dan o material conectivo das gráficas sociais
14:34
and how they relate to content.
328
859000
3000
e como se relacionan co contido.
14:37
And we can now start to probe the structure
329
862000
2000
E podemos comezar a probar a estrutura
14:39
in interesting ways.
330
864000
2000
de forma interesante.
14:41
So if we, for example, trace the path
331
866000
3000
Así que, se, por exemplo, rastrexamos o camiño
14:44
of one piece of content
332
869000
2000
dunha peza de contido
14:46
that drives someone to comment on it,
333
871000
2000
que leva a alguén a comentalo,
14:48
and then we follow where that comment goes,
334
873000
3000
e entón seguimos onde é que vai ese comentario,
14:51
and then look at the entire social graph that becomes activated
335
876000
3000
e despois observamos a gráfica social completa que se activa
14:54
and then trace back to see the relationship
336
879000
3000
e logo volvemos a rastrexar para ver a relación
14:57
between that social graph and content,
337
882000
2000
entre a gráfica social e o contido,
14:59
a very interesting structure becomes visible.
338
884000
2000
aparece unha estrutura moi interesante.
15:01
We call this a co-viewing clique,
339
886000
2000
Chamámoslle un "círculo de co-expectación"
15:03
a virtual living room if you will.
340
888000
3000
se queredes, unha sala de estar virtual.
15:06
And there are fascinating dynamics at play.
341
891000
2000
E hai dinámicas fascinantes en xogo.
15:08
It's not one way.
342
893000
2000
Non é só nun sentido.
15:10
A piece of content, an event, causes someone to talk.
343
895000
3000
Unha peza de contido, un evento, causa que alguén fale.
15:13
They talk to other people.
344
898000
2000
Falan con outra xente.
15:15
That drives tune-in behavior back into mass media,
345
900000
3000
Que conduce o comportamento nos medios de comunicación,
15:18
and you have these cycles
346
903000
2000
e se obteñen estes ciclos
15:20
that drive the overall behavior.
347
905000
2000
que conducen o comportamento en conxunto.
15:22
Another example -- very different --
348
907000
2000
Outro exemplo -- moi diferente --
15:24
another actual person in our database --
349
909000
3000
outra persoa real na nosa base de datos --
15:27
and we're finding at least hundreds, if not thousands, of these.
350
912000
3000
e estamos a atopar, como mínimo, centos, se non miles como estes.
15:30
We've given this person a name.
351
915000
2000
Démoslle un nome a esta persoa.
15:32
This is a pro-amateur, or pro-am media critic
352
917000
3000
Este é un crítico de medios pro-amater, ou pro-am,
15:35
who has this high fan-out rate.
353
920000
3000
que ten unha gran cantidade de seguidores.
15:38
So a lot of people are following this person -- very influential --
354
923000
3000
Así que un montón de xente segue a esta persoa -- moi influente --
15:41
and they have a propensity to talk about what's on TV.
355
926000
2000
e teñen propensión a falar sobre o que botan na tele.
15:43
So this person is a key link
356
928000
3000
Así que esta persoa é un enlace clave
15:46
in connecting mass media and social media together.
357
931000
3000
na conexión dos medios de masas e dos medios sociais.
15:49
One last example from this data:
358
934000
3000
Un último exemplo destes datos:
15:52
Sometimes it's actually a piece of content that is special.
359
937000
3000
algunhas veces, é un anaco do contido o que é especial.
15:55
So if we go and look at this piece of content,
360
940000
4000
Así que se observamos este anaco de contido,
15:59
President Obama's State of the Union address
361
944000
3000
o discurso sobre Estado da Unión do presidente Obama
16:02
from just a few weeks ago,
362
947000
2000
de hai só unhas semanas,
16:04
and look at what we find in this same data set,
363
949000
3000
e vemos que atopamos no mesmo conxunto de datos,
16:07
at the same scale,
364
952000
3000
na mesma escala,
16:10
the engagement properties of this piece of content
365
955000
2000
as propiedades de xeración de participación de este anaco de contido
16:12
are truly remarkable.
366
957000
2000
son moi notables.
16:14
A nation exploding in conversation
367
959000
2000
Unha nación explotando en conversación
16:16
in real time
368
961000
2000
en tempo real
16:18
in response to what's on the broadcast.
369
963000
3000
en resposta ó que está a ser retransmitido.
16:21
And of course, through all of these lines
370
966000
2000
E, por suposto, a través de todas estas liñas
16:23
are flowing unstructured language.
371
968000
2000
flúe linguaxe non estruturada.
16:25
We can X-ray
372
970000
2000
Podemos radiografar
16:27
and get a real-time pulse of a nation,
373
972000
2000
e conseguir o pulso dun país en tempo real,
16:29
real-time sense
374
974000
2000
o senso en tempo real
16:31
of the social reactions in the different circuits in the social graph
375
976000
3000
das reaccións sociais nos diferentes circuítos da gráfica social
16:34
being activated by content.
376
979000
3000
que son activados polo contido.
16:37
So, to summarize, the idea is this:
377
982000
3000
Así que, para resumir, a idea é esta:
16:40
As our world becomes increasingly instrumented
378
985000
3000
como o noso mundo se fai máis e máis instrumentado
16:43
and we have the capabilities
379
988000
2000
e temos as capacidades
16:45
to collect and connect the dots
380
990000
2000
para recoller e conectar os puntos
16:47
between what people are saying
381
992000
2000
entre o que a xente di
16:49
and the context they're saying it in,
382
994000
2000
e o contexto no que o din,
16:51
what's emerging is an ability
383
996000
2000
o que emerxe é unha habilidade
16:53
to see new social structures and dynamics
384
998000
3000
para ver novas estruturas sociais e dinámicas
16:56
that have previously not been seen.
385
1001000
2000
que non foran vistas previamente.
16:58
It's like building a microscope or telescope
386
1003000
2000
É como construír un microscopio ou telescopio
17:00
and revealing new structures
387
1005000
2000
e revelar novas estruturas
17:02
about our own behavior around communication.
388
1007000
3000
sobre o noso comportamento ó redor da comunicación.
17:05
And I think the implications here are profound,
389
1010000
3000
E eu penso que as implicacións son profundas,
17:08
whether it's for science,
390
1013000
2000
para a ciencia,
17:10
for commerce, for government,
391
1015000
2000
para o comercio, o goberno,
17:12
or perhaps most of all,
392
1017000
2000
ou quizais, no que máis
17:14
for us as individuals.
393
1019000
3000
para nós como individuos.
17:17
And so just to return to my son,
394
1022000
3000
E volvendo ó meu fillo,
17:20
when I was preparing this talk, he was looking over my shoulder,
395
1025000
3000
cando estaba a preparar esta charla, el estaba a mirar por encima do meu ombro,
17:23
and I showed him the clips I was going to show to you today,
396
1028000
2000
e mostreille os vídeos que ía mostrar hoxe aquí,
17:25
and I asked him for permission -- granted.
397
1030000
3000
e pedinlle permiso -- concedido.
17:28
And then I went on to reflect,
398
1033000
2000
E entón seguín a reflexionar,
17:30
"Isn't it amazing,
399
1035000
3000
"Non é marabilloso?
17:33
this entire database, all these recordings,
400
1038000
3000
Toda esta base de datos, todas estas gravacións,
17:36
I'm going to hand off to you and to your sister" --
401
1041000
2000
vouchas dar a ti e máis a túa irmá,"
17:38
who arrived two years later --
402
1043000
3000
que chegou dous anos despois.
17:41
"and you guys are going to be able to go back and re-experience moments
403
1046000
3000
"E vós seredes capaces de volver e revivir momentos
17:44
that you could never, with your biological memory,
404
1049000
3000
que nunca poderiades recordar coa vosa memoria biolóxica,
17:47
possibly remember the way you can now?"
405
1052000
2000
como o podedes facer agora."
17:49
And he was quiet for a moment.
406
1054000
2000
E el ficou quedo un momento.
17:51
And I thought, "What am I thinking?
407
1056000
2000
E pensei. "Que estou a pensar?
17:53
He's five years old. He's not going to understand this."
408
1058000
2000
Ten cinco anos. Non vai entender isto."
17:55
And just as I was having that thought, he looked up at me and said,
409
1060000
3000
E xusto cando estaba a pensar nisto, el miroume e dixo,
17:58
"So that when I grow up,
410
1063000
2000
"Así que cando medre,
18:00
I can show this to my kids?"
411
1065000
2000
podo amosarlle isto ós meus fillos?"
18:02
And I thought, "Wow, this is powerful stuff."
412
1067000
3000
E pensei, "Vaia, isto é material potente".
18:05
So I want to leave you
413
1070000
2000
Así que, quero deixarvos
18:07
with one last memorable moment
414
1072000
2000
cun último momento memorable
18:09
from our family.
415
1074000
3000
da nosa familia.
18:12
This is the first time our son
416
1077000
2000
Esta é a primeira vez que noso fillo
18:14
took more than two steps at once --
417
1079000
2000
deu máis de dous pasos seguidos --
18:16
captured on film.
418
1081000
2000
capturado en vídeo.
18:18
And I really want you to focus on something
419
1083000
3000
E quero que vos centredes en algo
18:21
as I take you through.
420
1086000
2000
mentres vos levo.
18:23
It's a cluttered environment; it's natural life.
421
1088000
2000
E un entorno desordenado; é a vida natural.
18:25
My mother's in the kitchen, cooking,
422
1090000
2000
Miña nai está na cociña, cociñando,
18:27
and, of all places, in the hallway,
423
1092000
2000
e, de todos os sitios, no corredor,
18:29
I realize he's about to do it, about to take more than two steps.
424
1094000
3000
decátome de que vai facelo, vai a dar máis de dous pasos.
18:32
And so you hear me encouraging him,
425
1097000
2000
E aquí estou animándoo,
18:34
realizing what's happening,
426
1099000
2000
asimilando o que está a pasar
18:36
and then the magic happens.
427
1101000
2000
e entón ocorre a maxia.
18:38
Listen very carefully.
428
1103000
2000
Escoitade con atención.
18:40
About three steps in,
429
1105000
2000
Uns tres pasos,
18:42
he realizes something magic is happening,
430
1107000
2000
e el se decata de que algo máxico está a ocorrer.
18:44
and the most amazing feedback loop of all kicks in,
431
1109000
3000
E aparece o lazo de resposta máis incrible,
18:47
and he takes a breath in,
432
1112000
2000
e el toma aire,
18:49
and he whispers "wow"
433
1114000
2000
e murmura "vaia"
18:51
and instinctively I echo back the same.
434
1116000
4000
e instintivamente eu contesto o mesmo.
18:56
And so let's fly back in time
435
1121000
3000
Así pois, volvamos atrás no tempo
18:59
to that memorable moment.
436
1124000
2000
até ese momento memorable.
19:05
(Video) DR: Hey.
437
1130000
2000
(Vídeo) DR: Ei.
19:07
Come here.
438
1132000
2000
Ven acó.
19:09
Can you do it?
439
1134000
3000
Podes facelo?
19:13
Oh, boy.
440
1138000
2000
Vaites!
19:15
Can you do it?
441
1140000
3000
Podes facelo?
19:18
Baby: Yeah.
442
1143000
2000
Bebé: Si.
19:20
DR: Ma, he's walking.
443
1145000
3000
DR: Mamá, está andando.
19:24
(Laughter)
444
1149000
2000
(Risas)
19:26
(Applause)
445
1151000
2000
(Aplauso)
19:28
DR: Thank you.
446
1153000
2000
DR: Grazas.
19:30
(Applause)
447
1155000
15000
(Aplauso)
Translated by Marcos Lomba
Reviewed by Raquel Uzal

▲Back to top

ABOUT THE SPEAKER
Deb Roy - Cognitive scientist
Deb Roy studies how children learn language, and designs machines that learn to communicate in human-like ways. On sabbatical from MIT Media Lab, he's working with the AI company Bluefin Labs.

Why you should listen

Deb Roy directs the Cognitive Machines group at the MIT Media Lab, where he studies how children learn language, and designs machines that learn to communicate in human-like ways. To enable this work, he has pioneered new data-driven methods for analyzing and modeling human linguistic and social behavior. He has authored numerous scientific papers on artificial intelligence, cognitive modeling, human-machine interaction, data mining, and information visualization.

Deb Roy was the co-founder and serves as CEO of Bluefin Labs, a venture-backed technology company. Built upon deep machine learning principles developed in his research over the past 15 years, Bluefin has created a technology platform that analyzes social media commentary to measure real-time audience response to TV ads and shows.

Follow Deb Roy on Twitter>

Roy adds some relevant papers:

Deb Roy. (2009). New Horizons in the Study of Child Language Acquisition. Proceedings of Interspeech 2009. Brighton, England. bit.ly/fSP4Qh

Brandon C. Roy, Michael C. Frank and Deb Roy. (2009). Exploring word learning in a high-density longitudinal corpus. Proceedings of the 31st Annual Meeting of the Cognitive Science Society. Amsterdam, Netherlands. bit.ly/e1qxej

Plenty more papers on our research including technology and methodology can be found here, together with other research from my lab at MIT: bit.ly/h3paSQ

The work that I mentioned on relationships between television content and the social graph is being done at Bluefin Labs (www.bluefinlabs.com). Details of this work have not been published. The social structures we are finding (and that I highlighted in my TED talk) are indeed new. The social media communication channels that are leading to their formation did not even exist a few years ago, and Bluefin's technology platform for discovering these kinds of structures is the first of its kind. We'll certainly have more to say about all this as we continue to dig into this fascinating new kind of data, and as new social structures continue to evolve!

More profile about the speaker
Deb Roy | Speaker | TED.com