ABOUT THE SPEAKER
Rupal Patel - Speech scientist
People relying on synthetic speech use the voice they’re given, not their own. Rupal Patel created the vocaliD project to change that.

Why you should listen

Northeastern University computer science professor Rupal Patel looks for ways to give voice to the voiceless. As founder and director of the Communication Analysis and Design Laboratory (CadLab), she developed a technology that combines real human voices with the characteristics of individual speech patterns. The result is VocaliD, an innovation that gives people who can't speak the ability to communicate in a voice all their own.

"There's nothing better than seeing the person who's actually going to use it, seeing their reaction, seeing their smile," says Patel.

More profile about the speaker
Rupal Patel | Speaker | TED.com
TEDWomen 2013

Rupal Patel: Synthetic voices, as unique as fingerprints

Filmed:
944,754 views

Many of those with severe speech disorders use a computerized device to communicate. Yet they choose between only a few voice options. That's why Stephen Hawking has an American accent, and why many people end up with the same voice, often to incongruous effect. Speech scientist Rupal Patel wanted to do something about this, and in this wonderful talk she shares her work to engineer unique voices for the voiceless.
- Speech scientist
People relying on synthetic speech use the voice they’re given, not their own. Rupal Patel created the vocaliD project to change that. Full bio

Double-click the English transcript below to play the video.

00:12
I'd like to talk today
0
719
1490
00:14
about a powerful and fundamental aspect
1
2209
2927
00:17
of who we are: our voice.
2
5136
3598
00:20
Each one of us has a unique voiceprint
3
8734
2746
00:23
that reflects our age, our size,
4
11480
2289
00:25
even our lifestyle and personality.
5
13769
3237
00:29
In the words of the poet Longfellow,
6
17006
2142
00:31
"the human voice is the organ of the soul."
7
19148
3870
00:35
As a speech scientist, I'm fascinated
8
23018
2747
00:37
by how the voice is produced,
9
25765
1829
00:39
and I have an idea for how it can be engineered.
10
27594
3658
00:43
That's what I'd like to share with you.
11
31252
2210
00:45
I'm going to start by playing you a sample
12
33462
1814
00:47
of a voice that you may recognize.
13
35276
1871
00:49
(Recording) Stephen Hawking: "I would have thought
14
37147
1304
00:50
it was fairly obvious what I meant."
15
38451
2749
00:53
Rupal Patel: That was the voice
16
41200
1280
00:54
of Professor Stephen Hawking.
17
42480
2086
00:56
What you may not know is that same voice
18
44566
3849
01:00
may also be used by this little girl
19
48415
2478
01:02
who is unable to speak
20
50893
1697
01:04
because of a neurological condition.
21
52590
2597
01:07
In fact, all of these individuals
22
55187
2068
01:09
may be using the same voice,
23
57255
2012
01:11
and that's because there's
only a few options available.
24
59267
3557
01:14
In the U.S. alone, there are 2.5 million Americans
25
62824
4317
01:19
who are unable to speak,
26
67141
1610
01:20
and many of whom use computerized devices
27
68751
2622
01:23
to communicate.
28
71373
1522
01:24
Now that's millions of people worldwide
29
72895
3479
01:28
who are using generic voices,
30
76374
1652
01:30
including Professor Hawking,
31
78026
1446
01:31
who uses an American-accented voice.
32
79472
4833
01:36
This lack of individuation of the synthetic voice
33
84305
3328
01:39
really hit home
34
87633
1416
01:41
when I was at an assistive technology conference
35
89049
2472
01:43
a few years ago,
36
91521
1850
01:45
and I recall walking into an exhibit hall
37
93371
3604
01:48
and seeing a little girl and a grown man
38
96975
3044
01:52
having a conversation using their devices,
39
100019
2916
01:54
different devices, but the same voice.
40
102935
4284
01:59
And I looked around and I saw this happening
41
107219
1909
02:01
all around me, literally hundreds of individuals
42
109128
4190
02:05
using a handful of voices,
43
113318
2738
02:08
voices that didn't fit their bodies
44
116056
3091
02:11
or their personalities.
45
119147
2082
02:13
We wouldn't dream of fitting a little girl
46
121229
2727
02:15
with the prosthetic limb of a grown man.
47
123956
3396
02:19
So why then the same prosthetic voice?
48
127352
3304
02:22
It really struck me,
49
130656
1291
02:23
and I wanted to do something about this.
50
131947
3151
02:27
I'm going to play you now a sample
51
135098
1953
02:29
of someone who has, two people actually,
52
137051
3288
02:32
who have severe speech disorders.
53
140339
1768
02:34
I want you to take a listen to how they sound.
54
142107
3230
02:37
They're saying the same utterance.
55
145337
2357
02:39
(First voice)
56
147694
2432
02:42
(Second voice)
57
150126
3617
02:45
You probably didn't understand what they said,
58
153743
2412
02:48
but I hope that you heard
59
156155
1854
02:50
their unique vocal identities.
60
158009
4283
02:54
So what I wanted to do next is,
61
162292
2813
02:57
I wanted to find out how we could harness
62
165105
2384
02:59
these residual vocal abilities
63
167489
1821
03:01
and build a technology
64
169310
2016
03:03
that could be customized for them,
65
171326
2143
03:05
voices that could be customized for them.
66
173469
2429
03:07
So I reached out to my collaborator, Tim Bunnell.
67
175898
2685
03:10
Dr. Bunnell is an expert in speech synthesis,
68
178583
3063
03:13
and what he'd been doing is building
69
181646
2033
03:15
personalized voices for people
70
183679
1881
03:17
by putting together
71
185560
2097
03:19
pre-recorded samples of their voice
72
187657
2150
03:21
and reconstructing a voice for them.
73
189807
2879
03:24
These are people who had lost their voice
74
192686
1712
03:26
later in life.
75
194398
1911
03:28
We didn't have the luxury
76
196309
1394
03:29
of pre-recorded samples of speech
77
197703
1774
03:31
for those born with speech disorder.
78
199477
2292
03:33
But I thought, there had to be a way
79
201769
2537
03:36
to reverse engineer a voice
80
204306
1944
03:38
from whatever little is left over.
81
206250
2291
03:40
So we decided to do exactly that.
82
208541
2714
03:43
We set out with a little bit of funding
from the National Science Foundation,
83
211255
3403
03:46
to create custom-crafted voices that captured
84
214658
3565
03:50
their unique vocal identities.
85
218223
1536
03:51
We call this project VocaliD, or vocal I.D.,
86
219759
3203
03:54
for vocal identity.
87
222962
2033
03:56
Now before I get into the details of how
88
224995
2674
03:59
the voice is made and let you listen to it,
89
227669
2048
04:01
I need to give you a real quick
speech science lesson. Okay?
90
229717
3350
04:05
So first, we know that the voice is changing
91
233067
3159
04:08
dramatically over the course of development.
92
236226
2854
04:11
Children sound different from teens
93
239080
2090
04:13
who sound different from adults.
94
241170
1463
04:14
We've all experienced this.
95
242633
2642
04:17
Fact number two is that speech
96
245275
3363
04:20
is a combination of the source,
97
248638
2553
04:23
which is the vibrations generated by your voice box,
98
251191
3479
04:26
which are then pushed through
99
254670
1939
04:28
the rest of the vocal tract.
100
256609
2437
04:31
These are the chambers of your head and neck
101
259046
2484
04:33
that vibrate,
102
261530
1239
04:34
and they actually filter that source sound
103
262769
2110
04:36
to produce consonants and vowels.
104
264879
2537
04:39
So the combination of source and filter
105
267416
3860
04:43
is how we produce speech.
106
271276
2630
04:45
And that happens in one individual.
107
273906
3026
04:48
Now I told you earlier that I'd spent
108
276932
2626
04:51
a good part of my career
109
279558
2025
04:53
understanding and studying
110
281583
2453
04:56
the source characteristics of people
111
284036
1958
04:57
with severe speech disorder,
112
285994
2301
05:00
and what I've found
113
288295
1465
05:01
is that even though their filters were impaired,
114
289760
3366
05:05
they were able to modulate their source:
115
293126
2961
05:08
the pitch, the loudness, the tempo of their voice.
116
296087
3262
05:11
These are called prosody, and
I've been documenting for years
117
299349
3368
05:14
that the prosodic abilities of these individuals
118
302717
2277
05:16
are preserved.
119
304994
1575
05:18
So when I realized that those same cues
120
306569
4087
05:22
are also important for speaker identity,
121
310656
2769
05:25
I had this idea.
122
313425
2015
05:27
Why don't we take the source
123
315440
2516
05:29
from the person we want the voice to sound like,
124
317956
2213
05:32
because it's preserved,
125
320169
1463
05:33
and borrow the filter
126
321632
2135
05:35
from someone about the same age and size,
127
323767
3229
05:39
because they can articulate speech,
128
327011
2407
05:41
and then mix them?
129
329418
1791
05:43
Because when we mix them,
130
331209
1787
05:44
we can get a voice that's as clear
131
332996
1698
05:46
as our surrogate talker --
132
334694
1754
05:48
that's the person we borrowed the filter from—
133
336448
2595
05:51
and is similar in identity to our target talker.
134
339043
4649
05:55
It's that simple.
135
343692
1427
05:57
That's the science behind what we're doing.
136
345119
2934
06:00
So once you have that in mind,
137
348053
3533
06:03
how do you go about building this voice?
138
351586
2258
06:05
Well, you have to find someone
139
353844
1480
06:07
who is willing to be a surrogate.
140
355324
2400
06:09
It's not such an ominous thing.
141
357724
2264
06:11
Being a surrogate donor
142
359988
1523
06:13
only requires you to say a few hundred
143
361511
2788
06:16
to a few thousand utterances.
144
364299
2242
06:18
The process goes something like this.
145
366541
2003
06:20
(Video) Voice: Things happen in pairs.
146
368544
2190
06:22
I love to sleep.
147
370734
1925
06:24
The sky is blue without clouds.
148
372659
3882
06:28
RP: Now she's going to go on like this
149
376541
2002
06:30
for about three to four hours,
150
378543
1919
06:32
and the idea is not for her to say everything
151
380462
3005
06:35
that the target is going to want to say,
152
383467
2045
06:37
but the idea is to cover all the different combinations
153
385512
3395
06:40
of the sounds that occur in the language.
154
388907
3271
06:44
The more speech you have,
155
392178
1638
06:45
the better sounding voice you're going to have.
156
393816
2305
06:48
Once you have those recordings,
157
396121
1673
06:49
what we need to do
158
397794
1413
06:51
is we have to parse these recordings
159
399207
2718
06:53
into little snippets of speech,
160
401925
2449
06:56
one- or two-sound combinations,
161
404374
2337
06:58
sometimes even whole words
162
406711
1883
07:00
that start populating a dataset or a database.
163
408594
4516
07:05
We're going to call this database a voice bank.
164
413110
3717
07:08
Now the power of the voice bank
165
416827
2096
07:10
is that from this voice bank,
166
418923
2014
07:12
we can now say any new utterance,
167
420937
2011
07:14
like, "I love chocolate" --
168
422948
1424
07:16
everyone needs to be able to say that—
169
424372
1739
07:18
fish through that database
170
426111
1831
07:19
and find all the segments necessary
171
427942
1940
07:21
to say that utterance.
172
429882
1929
07:23
(Video) Voice: I love chocolate.
173
431811
1789
07:25
RP: So that's speech synthesis.
174
433600
1391
07:26
It's called concatenative synthesis,
and that's what we're using.
175
434991
2573
07:29
That's not the novel part.
176
437564
1533
07:31
What's novel is how we make it sound
177
439097
2221
07:33
like this young woman.
178
441318
1457
07:34
This is Samantha.
179
442775
1524
07:36
I met her when she was nine,
180
444299
2346
07:38
and since then, my team and I
181
446645
1897
07:40
have been trying to build her a personalized voice.
182
448542
2714
07:43
We first had to find a surrogate donor,
183
451256
3099
07:46
and then we had to have Samantha
184
454355
1818
07:48
produce some utterances.
185
456173
1929
07:50
What she can produce are mostly vowel-like sounds,
186
458102
2379
07:52
but that's enough for us to extract
187
460481
2479
07:54
her source characteristics.
188
462960
2285
07:57
What happens next is best described
189
465245
3271
08:00
by my daughter's analogy. She's six.
190
468516
2767
08:03
She calls it mixing colors to paint voices.
191
471283
5422
08:08
It's beautiful. It's exactly that.
192
476705
2555
08:11
Samantha's voice is like a concentrated sample
193
479260
2860
08:14
of red food dye which we can infuse
194
482120
2609
08:16
into the recordings of her surrogate
195
484729
2540
08:19
to get a pink voice just like this.
196
487269
4387
08:23
(Video) Samantha: Aaaaaah.
197
491656
4491
08:28
RP: So now, Samantha can say this.
198
496147
2808
08:30
(Video) Samantha: This voice is only for me.
199
498955
3069
08:34
I can't wait to use my new voice with my friends.
200
502024
6305
08:40
RP: Thank you. (Applause)
201
508329
6417
08:46
I'll never forget the gentle smile
202
514746
2333
08:49
that spread across her face
203
517079
1902
08:50
when she heard that voice for the first time.
204
518981
3649
08:54
Now there's millions of people
205
522630
1882
08:56
around the world like Samantha, millions,
206
524512
2833
08:59
and we've only begun to scratch the surface.
207
527345
3440
09:02
What we've done so far is we have
208
530785
1642
09:04
a few surrogate talkers from around the U.S.
209
532427
3859
09:08
who have donated their voices,
210
536286
1507
09:09
and we have been using those
211
537793
1928
09:11
to build our first few personalized voices.
212
539721
4472
09:16
But there's so much more work to be done.
213
544193
1756
09:17
For Samantha, her surrogate
214
545949
2188
09:20
came from somewhere in the Midwest, a stranger
215
548137
3046
09:23
who gave her the gift of voice.
216
551183
3841
09:27
And as a scientist, I'm so excited
217
555024
2153
09:29
to take this work out of the laboratory
218
557177
1935
09:31
and finally into the real world
219
559112
1800
09:32
so it can have real-world impact.
220
560912
3165
09:36
What I want to share with you next
221
564077
1582
09:37
is how I envision taking this work
222
565659
2175
09:39
to that next level.
223
567834
2711
09:42
I imagine a whole world of surrogate donors
224
570545
3887
09:46
from all walks of life, different sizes, different ages,
225
574432
3260
09:49
coming together in this voice drive
226
577692
3058
09:52
to give people voices
227
580750
2270
09:55
that are as colorful as their personalities.
228
583020
3799
09:58
To do that as a first step,
229
586819
2300
10:01
we've put together this website, VocaliD.org,
230
589119
3275
10:04
as a way to bring together those
231
592394
1624
10:06
who want to join us as voice donors,
232
594018
2675
10:08
as expertise donors,
233
596693
1772
10:10
in whatever way to make this vision a reality.
234
598465
5339
10:15
They say that giving blood can save lives.
235
603804
4153
10:19
Well, giving your voice can change lives.
236
607957
4982
10:24
All we need is a few hours of speech
237
612939
3050
10:27
from our surrogate talker,
238
615989
1491
10:29
and as little as a vowel from our target talker,
239
617480
4733
10:34
to create a unique vocal identity.
240
622213
3711
10:37
So that's the science behind what we're doing.
241
625924
2626
10:40
I want to end by circling back to the human side
242
628550
4455
10:45
that is really the inspiration for this work.
243
633005
4102
10:49
About five years ago, we built our very first voice
244
637107
3699
10:52
for a little boy named William.
245
640806
2501
10:55
When his mom first heard this voice,
246
643307
2357
10:57
she said, "This is what William
247
645664
2345
11:00
would have sounded like
248
648009
1546
11:01
had he been able to speak."
249
649555
2449
11:04
And then I saw William typing a message
250
652004
2418
11:06
on his device.
251
654422
1362
11:07
I wondered, what was he thinking?
252
655784
3293
11:11
Imagine carrying around someone else's voice
253
659077
3590
11:14
for nine years
254
662667
2193
11:16
and finally finding your own voice.
255
664860
4844
11:21
Imagine that.
256
669704
1377
11:23
This is what William said:
257
671081
2797
11:25
"Never heard me before."
258
673878
4463
11:32
Thank you.
259
680417
1619
11:34
(Applause)
260
682036
4724

▲Back to top

ABOUT THE SPEAKER
Rupal Patel - Speech scientist
People relying on synthetic speech use the voice they’re given, not their own. Rupal Patel created the vocaliD project to change that.

Why you should listen

Northeastern University computer science professor Rupal Patel looks for ways to give voice to the voiceless. As founder and director of the Communication Analysis and Design Laboratory (CadLab), she developed a technology that combines real human voices with the characteristics of individual speech patterns. The result is VocaliD, an innovation that gives people who can't speak the ability to communicate in a voice all their own.

"There's nothing better than seeing the person who's actually going to use it, seeing their reaction, seeing their smile," says Patel.

More profile about the speaker
Rupal Patel | Speaker | TED.com