ABOUT THE SPEAKER

Rupal Patel - Speech scientist
People relying on synthetic speech use the voice they’re given, not their own. Rupal Patel created the vocaliD project to change that.

Why you should listen

Northeastern University computer science professor Rupal Patel looks for ways to give voice to the voiceless. As founder and director of the Communication Analysis and Design Laboratory (CadLab), she developed a technology that combines real human voices with the characteristics of individual speech patterns. The result is VocaliD, an innovation that gives people who can't speak the ability to communicate in a voice all their own.

"There's nothing better than seeing the person who's actually going to use it, seeing their reaction, seeing their smile," says Patel.

More profile about the speaker
Rupal Patel | Speaker | TED.com

TEDWomen 2013

Rupal Patel: Synthetic voices, as unique as fingerprints

Filmed: 2013-12-05

Readability: 3.9

944,754 views

Many of those with severe speech disorders use a computerized device to communicate. Yet they choose between only a few voice options. That's why Stephen Hawking has an American accent, and why many people end up with the same voice, often to incongruous effect. Speech scientist Rupal Patel wanted to do something about this, and in this wonderful talk she shares her work to engineer unique voices for the voiceless.

Rupal Patel - Speech scientist
People relying on synthetic speech use the voice they’re given, not their own. Rupal Patel created the vocaliD project to change that. Full bio

Double-click the English transcript below to play the video.

00:12

I'd like to talk today

0

719

1490

00:14

about a powerful and fundamental aspect

1

2209

2927

00:17

of who we are: our voice.

2

5136

3598

00:20

Each one of us has a unique voiceprint

3

8734

2746

00:23

that reflects our age, our size,

4

11480

2289

00:25

even our lifestyle and personality.

5

13769

3237

00:29

In the words of the poet Longfellow,

6

17006

2142

00:31

"the human voice is the organ of the soul."

7

19148

3870

00:35

As a speech scientist, I'm fascinated

8

23018

2747

00:37

by how the voice is produced,

9

25765

1829

00:39

and I have an idea for how it can be engineered.

10

27594

3658

00:43

That's what I'd like to share with you.

11

31252

2210

00:45

I'm going to start by playing you a sample

12

33462

1814

00:47

of a voice that you may recognize.

13

35276

1871

00:49

(Recording) Stephen Hawking: "I would have thought

14

37147

1304

00:50

it was fairly obvious what I meant."

15

38451

2749

00:53

Rupal Patel: That was the voice

16

41200

1280

00:54

of Professor Stephen Hawking.

17

42480

2086

00:56

What you may not know is that same voice

18

44566

3849

01:00

may also be used by this little girl

19

48415

2478

01:02

who is unable to speak

20

50893

1697

01:04

because of a neurological condition.

21

52590

2597

01:07

In fact, all of these individuals

22

55187

2068

01:09

may be using the same voice,

23

57255

2012

01:11

and that's because there's
only a few options available.

24

59267

3557

01:14

In the U.S. alone, there are 2.5 million Americans

25

62824

4317

01:19

who are unable to speak,

26

67141

1610

01:20

and many of whom use computerized devices

27

68751

2622

01:23

to communicate.

28

71373

1522

01:24

Now that's millions of people worldwide

29

72895

3479

01:28

who are using generic voices,

30

76374

1652

01:30

including Professor Hawking,

31

78026

1446

01:31

who uses an American-accented voice.

32

79472

4833

01:36

This lack of individuation of the synthetic voice

33

84305

3328

01:39

really hit home

34

87633

1416

01:41

when I was at an assistive technology conference

35

89049

2472

01:43

a few years ago,

36

91521

1850

01:45

and I recall walking into an exhibit hall

37

93371

3604

01:48

and seeing a little girl and a grown man

38

96975

3044

01:52

having a conversation using their devices,

39

100019

2916

01:54

different devices, but the same voice.

40

102935

4284

01:59

And I looked around and I saw this happening

41

107219

1909

02:01

all around me, literally hundreds of individuals

42

109128

4190

02:05

using a handful of voices,

43

113318

2738

02:08

voices that didn't fit their bodies

44

116056

3091

02:11

or their personalities.

45

119147

2082

02:13

We wouldn't dream of fitting a little girl

46

121229

2727

02:15

with the prosthetic limb of a grown man.

47

123956

3396

02:19

So why then the same prosthetic voice?

48

127352

3304

02:22

It really struck me,

49

130656

1291

02:23

and I wanted to do something about this.

50

131947

3151

02:27

I'm going to play you now a sample

51

135098

1953

02:29

of someone who has, two people actually,

52

137051

3288

02:32

who have severe speech disorders.

53

140339

1768

02:34

I want you to take a listen to how they sound.

54

142107

3230

02:37

They're saying the same utterance.

55

145337

2357

02:39

(First voice)

56

147694

2432

02:42

(Second voice)

57

150126

3617

02:45

You probably didn't understand what they said,

58

153743

2412

02:48

but I hope that you heard

59

156155

1854

02:50

their unique vocal identities.

60

158009

4283

02:54

So what I wanted to do next is,

61

162292

2813

02:57

I wanted to find out how we could harness

62

165105

2384

02:59

these residual vocal abilities

63

167489

1821

03:01

and build a technology

64

169310

2016

03:03

that could be customized for them,

65

171326

2143

03:05

voices that could be customized for them.

66

173469

2429

03:07

So I reached out to my collaborator, Tim Bunnell.

67

175898

2685

03:10

Dr. Bunnell is an expert in speech synthesis,

68

178583

3063

03:13

and what he'd been doing is building

69

181646

2033

03:15

personalized voices for people

70

183679

1881

03:17

by putting together

71

185560

2097

03:19

pre-recorded samples of their voice

72

187657

2150

03:21

and reconstructing a voice for them.

73

189807

2879

03:24

These are people who had lost their voice

74

192686

1712

03:26

later in life.

75

194398

1911

03:28

We didn't have the luxury

76

196309

1394

03:29

of pre-recorded samples of speech

77

197703

1774

03:31

for those born with speech disorder.

78

199477

2292

03:33

But I thought, there had to be a way

79

201769

2537

03:36

to reverse engineer a voice

80

204306

1944

03:38

from whatever little is left over.

81

206250

2291

03:40

So we decided to do exactly that.

82

208541

2714

03:43

We set out with a little bit of funding
from the National Science Foundation,

83

211255

3403

03:46

to create custom-crafted voices that captured

84

214658

3565

03:50

their unique vocal identities.

85

218223

1536

03:51

We call this project VocaliD, or vocal I.D.,

86

219759

3203

03:54

for vocal identity.

87

222962

2033

03:56

Now before I get into the details of how

88

224995

2674

03:59

the voice is made and let you listen to it,

89

227669

2048

04:01

I need to give you a real quick
speech science lesson. Okay?

90

229717

3350

04:05

So first, we know that the voice is changing

91

233067

3159

04:08

dramatically over the course of development.

92

236226

2854

04:11

Children sound different from teens

93

239080

2090

04:13

who sound different from adults.

94

241170

1463

04:14

We've all experienced this.

95

242633

2642

04:17

Fact number two is that speech

96

245275

3363

04:20

is a combination of the source,

97

248638

2553

04:23

which is the vibrations generated by your voice box,

98

251191

3479

04:26

which are then pushed through

99

254670

1939

04:28

the rest of the vocal tract.

100

256609

2437

04:31

These are the chambers of your head and neck

101

259046

2484

04:33

that vibrate,

102

261530

1239

04:34

and they actually filter that source sound

103

262769

2110

04:36

to produce consonants and vowels.

104

264879

2537

04:39

So the combination of source and filter

105

267416

3860

04:43

is how we produce speech.

106

271276

2630

04:45

And that happens in one individual.

107

273906

3026

04:48

Now I told you earlier that I'd spent

108

276932

2626

04:51

a good part of my career

109

279558

2025

04:53

understanding and studying

110

281583

2453

04:56

the source characteristics of people

111

284036

1958

04:57

with severe speech disorder,

112

285994

2301

05:00

and what I've found

113

288295

1465

05:01

is that even though their filters were impaired,

114

289760

3366

05:05

they were able to modulate their source:

115

293126

2961

05:08

the pitch, the loudness, the tempo of their voice.

116

296087

3262

05:11

These are called prosody, and
I've been documenting for years

117

299349

3368

05:14

that the prosodic abilities of these individuals

118

302717

2277

05:16

are preserved.

119

304994

1575

05:18

So when I realized that those same cues

120

306569

4087

05:22

are also important for speaker identity,

121

310656

2769

05:25

I had this idea.

122

313425

2015

05:27

Why don't we take the source

123

315440

2516

05:29

from the person we want the voice to sound like,

124

317956

2213

05:32

because it's preserved,

125

320169

1463

05:33

and borrow the filter

126

321632

2135

05:35

from someone about the same age and size,

127

323767

3229

05:39

because they can articulate speech,

128

327011

2407

05:41

and then mix them?

129

329418

1791

05:43

Because when we mix them,

130

331209

1787

05:44

we can get a voice that's as clear

131

332996

1698

05:46

as our surrogate talker --

132

334694

1754

05:48

that's the person we borrowed the filter from—

133

336448

2595

05:51

and is similar in identity to our target talker.

134

339043

4649

05:55

It's that simple.

135

343692

1427

05:57

That's the science behind what we're doing.

136

345119

2934

06:00

So once you have that in mind,

137

348053

3533

06:03

how do you go about building this voice?

138

351586

2258

06:05

Well, you have to find someone

139

353844

1480

06:07

who is willing to be a surrogate.

140

355324

2400

06:09

It's not such an ominous thing.

141

357724

2264

06:11

Being a surrogate donor

142

359988

1523

06:13

only requires you to say a few hundred

143

361511

2788

06:16

to a few thousand utterances.

144

364299

2242

06:18

The process goes something like this.

145

366541

2003

06:20

(Video) Voice: Things happen in pairs.

146

368544

2190

06:22

I love to sleep.

147

370734

1925

06:24

The sky is blue without clouds.

148

372659

3882

06:28

RP: Now she's going to go on like this

149

376541

2002

06:30

for about three to four hours,

150

378543

1919

06:32

and the idea is not for her to say everything

151

380462

3005

06:35

that the target is going to want to say,

152

383467

2045

06:37

but the idea is to cover all the different combinations

153

385512

3395

06:40

of the sounds that occur in the language.

154

388907

3271

06:44

The more speech you have,

155

392178

1638

06:45

the better sounding voice you're going to have.

156

393816

2305

06:48

Once you have those recordings,

157

396121

1673

06:49

what we need to do

158

397794

1413

06:51

is we have to parse these recordings

159

399207

2718

06:53

into little snippets of speech,

160

401925

2449

06:56

one- or two-sound combinations,

161

404374

2337

06:58

sometimes even whole words

162

406711

1883

07:00

that start populating a dataset or a database.

163

408594

4516

07:05

We're going to call this database a voice bank.

164

413110

3717

07:08

Now the power of the voice bank

165

416827

2096

07:10

is that from this voice bank,

166

418923

2014

07:12

we can now say any new utterance,

167

420937

2011

07:14

like, "I love chocolate" --

168

422948

1424

07:16

everyone needs to be able to say that—

169

424372

1739

07:18

fish through that database

170

426111

1831

07:19

and find all the segments necessary

171

427942

1940

07:21

to say that utterance.

172

429882

1929

07:23

(Video) Voice: I love chocolate.

173

431811

1789

07:25

RP: So that's speech synthesis.

174

433600

1391

07:26

It's called concatenative synthesis,
and that's what we're using.

175

434991

2573

07:29

That's not the novel part.

176

437564

1533

07:31

What's novel is how we make it sound

177

439097

2221

07:33

like this young woman.

178

441318

1457

07:34

This is Samantha.

179

442775

1524

07:36

I met her when she was nine,

180

444299

2346

07:38

and since then, my team and I

181

446645

1897

07:40

have been trying to build her a personalized voice.

182

448542

2714

07:43

We first had to find a surrogate donor,

183

451256

3099

07:46

and then we had to have Samantha

184

454355

1818

07:48

produce some utterances.

185

456173

1929

07:50

What she can produce are mostly vowel-like sounds,

186

458102

2379

07:52

but that's enough for us to extract

187

460481

2479

07:54

her source characteristics.

188

462960

2285

07:57

What happens next is best described

189

465245

3271

08:00

by my daughter's analogy. She's six.

190

468516

2767

08:03

She calls it mixing colors to paint voices.

191

471283

5422

08:08

It's beautiful. It's exactly that.

192

476705

2555

08:11

Samantha's voice is like a concentrated sample

193

479260

2860

08:14

of red food dye which we can infuse

194

482120

2609

08:16

into the recordings of her surrogate

195

484729

2540

08:19

to get a pink voice just like this.

196

487269

4387

08:23

(Video) Samantha: Aaaaaah.

197

491656

4491

08:28

RP: So now, Samantha can say this.

198

496147

2808

08:30

(Video) Samantha: This voice is only for me.

199

498955

3069

08:34

I can't wait to use my new voice with my friends.

200

502024

6305

08:40

RP: Thank you. (Applause)

201

508329

6417

08:46

I'll never forget the gentle smile

202

514746

2333

08:49

that spread across her face

203

517079

1902

08:50

when she heard that voice for the first time.

204

518981

3649

08:54

Now there's millions of people

205

522630

1882

08:56

around the world like Samantha, millions,

206

524512

2833

08:59

and we've only begun to scratch the surface.

207

527345

3440

09:02

What we've done so far is we have

208

530785

1642

09:04

a few surrogate talkers from around the U.S.

209

532427

3859

09:08

who have donated their voices,

210

536286

1507

09:09

and we have been using those

211

537793

1928

09:11

to build our first few personalized voices.

212

539721

4472

09:16

But there's so much more work to be done.

213

544193

1756

09:17

For Samantha, her surrogate

214

545949

2188

09:20

came from somewhere in the Midwest, a stranger

215

548137

3046

09:23

who gave her the gift of voice.

216

551183

3841

09:27

And as a scientist, I'm so excited

217

555024

2153

09:29

to take this work out of the laboratory

218

557177

1935

09:31

and finally into the real world

219

559112

1800

09:32

so it can have real-world impact.

220

560912

3165

09:36

What I want to share with you next

221

564077

1582

09:37

is how I envision taking this work

222

565659

2175

09:39

to that next level.

223

567834

2711

09:42

I imagine a whole world of surrogate donors

224

570545

3887

09:46

from all walks of life, different sizes, different ages,

225

574432

3260

09:49

coming together in this voice drive

226

577692

3058

09:52

to give people voices

227

580750

2270

09:55

that are as colorful as their personalities.

228

583020

3799

09:58

To do that as a first step,

229

586819

2300

10:01

we've put together this website, VocaliD.org,

230

589119

3275

10:04

as a way to bring together those

231

592394

1624

10:06

who want to join us as voice donors,

232

594018

2675

10:08

as expertise donors,

233

596693

1772

10:10

in whatever way to make this vision a reality.

234

598465

5339

10:15

They say that giving blood can save lives.

235

603804

4153

10:19

Well, giving your voice can change lives.

236

607957

4982

10:24

All we need is a few hours of speech

237

612939

3050

10:27

from our surrogate talker,

238

615989

1491

10:29

and as little as a vowel from our target talker,

239

617480

4733

10:34

to create a unique vocal identity.

240

622213

3711

10:37

So that's the science behind what we're doing.

241

625924

2626

10:40

I want to end by circling back to the human side

242

628550

4455

10:45

that is really the inspiration for this work.

243

633005

4102

10:49

About five years ago, we built our very first voice

244

637107

3699

10:52

for a little boy named William.

245

640806

2501

10:55

When his mom first heard this voice,

246

643307

2357

10:57

she said, "This is what William

247

645664

2345

11:00

would have sounded like

248

648009

1546

11:01

had he been able to speak."

249

649555

2449

11:04

And then I saw William typing a message

250

652004

2418

11:06

on his device.

251

654422

1362

11:07

I wondered, what was he thinking?

252

655784

3293

11:11

Imagine carrying around someone else's voice

253

659077

3590

11:14

for nine years

254

662667

2193

11:16

and finally finding your own voice.

255

664860

4844

11:21

Imagine that.

256

669704

1377

11:23

This is what William said:

257

671081

2797

11:25

"Never heard me before."

258

673878

4463

11:32

Thank you.

259

680417

1619

11:34

(Applause)

260

682036

4724

ABOUT THE SPEAKER

Rupal Patel - Speech scientist
People relying on synthetic speech use the voice they’re given, not their own. Rupal Patel created the vocaliD project to change that.

Why you should listen

Northeastern University computer science professor Rupal Patel looks for ways to give voice to the voiceless. As founder and director of the Communication Analysis and Design Laboratory (CadLab), she developed a technology that combines real human voices with the characteristics of individual speech patterns. The result is VocaliD, an innovation that gives people who can't speak the ability to communicate in a voice all their own.

"There's nothing better than seeing the person who's actually going to use it, seeing their reaction, seeing their smile," says Patel.

More profile about the speaker
Rupal Patel | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

Rupal Patel: Synthetic voices, as unique as fingerprints | TED Talk | TED.com