sponsored links
TEDWomen 2013

Rupal Patel: Synthetic voices, as unique as fingerprints

December 5, 2013

Many of those with severe speech disorders use a computerized device to communicate. Yet they choose between only a few voice options. That's why Stephen Hawking has an American accent, and why many people end up with the same voice, often to incongruous effect. Speech scientist Rupal Patel wanted to do something about this, and in this wonderful talk she shares her work to engineer unique voices for the voiceless.

Rupal Patel - Speech scientist
People relying on synthetic speech use the voice they’re given, not their own. Rupal Patel created the vocaliD project to change that. Full bio

sponsored links
Double-click the English subtitles below to play the video.
I'd like to talk today
00:12
about a powerful and fundamental aspect
00:14
of who we are: our voice.
00:16
Each one of us has a unique voiceprint
00:20
that reflects our age, our size,
00:23
even our lifestyle and personality.
00:25
In the words of the poet Longfellow,
00:28
"the human voice is the organ of the soul."
00:30
As a speech scientist, I'm fascinated
00:34
by how the voice is produced,
00:37
and I have an idea for how it can be engineered.
00:39
That's what I'd like to share with you.
00:43
I'm going to start by playing you a sample
00:45
of a voice that you may recognize.
00:47
(Recording) Stephen Hawking: "I would have thought
00:48
it was fairly obvious what I meant."
00:50
Rupal Patel: That was the voice
00:53
of Professor Stephen Hawking.
00:54
What you may not know is that same voice
00:56
may also be used by this little girl
01:00
who is unable to speak
01:02
because of a neurological condition.
01:04
In fact, all of these individuals
01:07
may be using the same voice,
01:09
and that's because there's
only a few options available.
01:11
In the U.S. alone, there are 2.5 million Americans
01:14
who are unable to speak,
01:18
and many of whom use computerized devices
01:20
to communicate.
01:23
Now that's millions of people worldwide
01:24
who are using generic voices,
01:28
including Professor Hawking,
01:29
who uses an American-accented voice.
01:31
This lack of individuation of the synthetic voice
01:36
really hit home
01:39
when I was at an assistive technology conference
01:40
a few years ago,
01:43
and I recall walking into an exhibit hall
01:45
and seeing a little girl and a grown man
01:48
having a conversation using their devices,
01:51
different devices, but the same voice.
01:54
And I looked around and I saw this happening
01:59
all around me, literally hundreds of individuals
02:00
using a handful of voices,
02:05
voices that didn't fit their bodies
02:07
or their personalities.
02:10
We wouldn't dream of fitting a little girl
02:13
with the prosthetic limb of a grown man.
02:15
So why then the same prosthetic voice?
02:19
It really struck me,
02:22
and I wanted to do something about this.
02:23
I'm going to play you now a sample
02:26
of someone who has, two people actually,
02:28
who have severe speech disorders.
02:32
I want you to take a listen to how they sound.
02:33
They're saying the same utterance.
02:37
(First voice)
02:39
(Second voice)
02:41
You probably didn't understand what they said,
02:45
but I hope that you heard
02:47
their unique vocal identities.
02:49
So what I wanted to do next is,
02:54
I wanted to find out how we could harness
02:56
these residual vocal abilities
02:59
and build a technology
03:01
that could be customized for them,
03:03
voices that could be customized for them.
03:05
So I reached out to my collaborator, Tim Bunnell.
03:07
Dr. Bunnell is an expert in speech synthesis,
03:10
and what he'd been doing is building
03:13
personalized voices for people
03:15
by putting together
03:17
pre-recorded samples of their voice
03:19
and reconstructing a voice for them.
03:21
These are people who had lost their voice
03:24
later in life.
03:26
We didn't have the luxury
03:28
of pre-recorded samples of speech
03:29
for those born with speech disorder.
03:31
But I thought, there had to be a way
03:33
to reverse engineer a voice
03:36
from whatever little is left over.
03:38
So we decided to do exactly that.
03:40
We set out with a little bit of funding
from the National Science Foundation,
03:43
to create custom-crafted voices that captured
03:46
their unique vocal identities.
03:50
We call this project VocaliD, or vocal I.D.,
03:51
for vocal identity.
03:54
Now before I get into the details of how
03:56
the voice is made and let you listen to it,
03:59
I need to give you a real quick
speech science lesson. Okay?
04:01
So first, we know that the voice is changing
04:04
dramatically over the course of development.
04:08
Children sound different from teens
04:10
who sound different from adults.
04:12
We've all experienced this.
04:14
Fact number two is that speech
04:17
is a combination of the source,
04:20
which is the vibrations generated by your voice box,
04:23
which are then pushed through
04:26
the rest of the vocal tract.
04:28
These are the chambers of your head and neck
04:30
that vibrate,
04:33
and they actually filter that source sound
04:34
to produce consonants and vowels.
04:36
So the combination of source and filter
04:39
is how we produce speech.
04:43
And that happens in one individual.
04:45
Now I told you earlier that I'd spent
04:48
a good part of my career
04:51
understanding and studying
04:53
the source characteristics of people
04:55
with severe speech disorder,
04:57
and what I've found
05:00
is that even though their filters were impaired,
05:01
they were able to modulate their source:
05:04
the pitch, the loudness, the tempo of their voice.
05:07
These are called prosody, and
I've been documenting for years
05:11
that the prosodic abilities of these individuals
05:14
are preserved.
05:16
So when I realized that those same cues
05:18
are also important for speaker identity,
05:22
I had this idea.
05:25
Why don't we take the source
05:27
from the person we want the voice to sound like,
05:29
because it's preserved,
05:31
and borrow the filter
05:33
from someone about the same age and size,
05:35
because they can articulate speech,
05:38
and then mix them?
05:41
Because when we mix them,
05:43
we can get a voice that's as clear
05:44
as our surrogate talker --
05:46
that's the person we borrowed the filter from—
05:48
and is similar in identity to our target talker.
05:50
It's that simple.
05:55
That's the science behind what we're doing.
05:56
So once you have that in mind,
05:59
how do you go about building this voice?
06:03
Well, you have to find someone
06:05
who is willing to be a surrogate.
06:07
It's not such an ominous thing.
06:09
Being a surrogate donor
06:11
only requires you to say a few hundred
06:13
to a few thousand utterances.
06:16
The process goes something like this.
06:18
(Video) Voice: Things happen in pairs.
06:20
I love to sleep.
06:22
The sky is blue without clouds.
06:24
RP: Now she's going to go on like this
06:28
for about three to four hours,
06:30
and the idea is not for her to say everything
06:32
that the target is going to want to say,
06:35
but the idea is to cover all the different combinations
06:37
of the sounds that occur in the language.
06:40
The more speech you have,
06:43
the better sounding voice you're going to have.
06:45
Once you have those recordings,
06:47
what we need to do
06:49
is we have to parse these recordings
06:51
into little snippets of speech,
06:53
one- or two-sound combinations,
06:56
sometimes even whole words
06:58
that start populating a dataset or a database.
07:00
We're going to call this database a voice bank.
07:04
Now the power of the voice bank
07:08
is that from this voice bank,
07:10
we can now say any new utterance,
07:12
like, "I love chocolate" --
07:14
everyone needs to be able to say that—
07:16
fish through that database
07:17
and find all the segments necessary
07:19
to say that utterance.
07:21
(Video) Voice: I love chocolate.
07:23
RP: So that's speech synthesis.
07:25
It's called concatenative synthesis,
and that's what we're using.
07:26
That's not the novel part.
07:29
What's novel is how we make it sound
07:30
like this young woman.
07:33
This is Samantha.
07:34
I met her when she was nine,
07:36
and since then, my team and I
07:38
have been trying to build her a personalized voice.
07:40
We first had to find a surrogate donor,
07:43
and then we had to have Samantha
07:46
produce some utterances.
07:47
What she can produce are mostly vowel-like sounds,
07:49
but that's enough for us to extract
07:52
her source characteristics.
07:54
What happens next is best described
07:57
by my daughter's analogy. She's six.
08:00
She calls it mixing colors to paint voices.
08:03
It's beautiful. It's exactly that.
08:08
Samantha's voice is like a concentrated sample
08:11
of red food dye which we can infuse
08:13
into the recordings of her surrogate
08:16
to get a pink voice just like this.
08:19
(Video) Samantha: Aaaaaah.
08:23
RP: So now, Samantha can say this.
08:27
(Video) Samantha: This voice is only for me.
08:30
I can't wait to use my new voice with my friends.
08:33
RP: Thank you. (Applause)
08:40
I'll never forget the gentle smile
08:46
that spread across her face
08:48
when she heard that voice for the first time.
08:50
Now there's millions of people
08:54
around the world like Samantha, millions,
08:56
and we've only begun to scratch the surface.
08:59
What we've done so far is we have
09:02
a few surrogate talkers from around the U.S.
09:04
who have donated their voices,
09:08
and we have been using those
09:09
to build our first few personalized voices.
09:11
But there's so much more work to be done.
09:16
For Samantha, her surrogate
09:17
came from somewhere in the Midwest, a stranger
09:19
who gave her the gift of voice.
09:23
And as a scientist, I'm so excited
09:26
to take this work out of the laboratory
09:28
and finally into the real world
09:30
so it can have real-world impact.
09:32
What I want to share with you next
09:35
is how I envision taking this work
09:37
to that next level.
09:39
I imagine a whole world of surrogate donors
09:42
from all walks of life, different sizes, different ages,
09:46
coming together in this voice drive
09:49
to give people voices
09:52
that are as colorful as their personalities.
09:54
To do that as a first step,
09:58
we've put together this website, VocaliD.org,
10:00
as a way to bring together those
10:04
who want to join us as voice donors,
10:05
as expertise donors,
10:08
in whatever way to make this vision a reality.
10:10
They say that giving blood can save lives.
10:15
Well, giving your voice can change lives.
10:19
All we need is a few hours of speech
10:24
from our surrogate talker,
10:27
and as little as a vowel from our target talker,
10:29
to create a unique vocal identity.
10:34
So that's the science behind what we're doing.
10:37
I want to end by circling back to the human side
10:40
that is really the inspiration for this work.
10:44
About five years ago, we built our very first voice
10:48
for a little boy named William.
10:52
When his mom first heard this voice,
10:55
she said, "This is what William
10:57
would have sounded like
10:59
had he been able to speak."
11:01
And then I saw William typing a message
11:03
on his device.
11:06
I wondered, what was he thinking?
11:07
Imagine carrying around someone else's voice
11:10
for nine years
11:14
and finally finding your own voice.
11:16
Imagine that.
11:21
This is what William said:
11:22
"Never heard me before."
11:25
Thank you.
11:32
(Applause)
11:33

sponsored links

Rupal Patel - Speech scientist
People relying on synthetic speech use the voice they’re given, not their own. Rupal Patel created the vocaliD project to change that.

Why you should listen

Northeastern University computer science professor Rupal Patel looks for ways to give voice to the voiceless. As founder and director of the Communication Analysis and Design Laboratory (CadLab), she developed a technology that combines real human voices with the characteristics of individual speech patterns. The result is VocaliD, an innovation that gives people who can't speak the ability to communicate in a voice all their own.

"There's nothing better than seeing the person who's actually going to use it, seeing their reaction, seeing their smile," says Patel.

sponsored links

If you need translations, you can install "Google Translate" extension into your Chrome Browser.
Furthermore, you can change playback rate by installing "Video Speed Controller" extension.

Data provided by TED.

This website is owned and operated by Tokyo English Network.
The developer's blog is here.