ABOUT THE SPEAKER

Peter Donnelly - Mathematician; statistician
Peter Donnelly is an expert in probability theory who applies statistical methods to genetic data -- spurring advances in disease treatment and insight on our evolution. He's also an expert on DNA analysis, and an advocate for sensible statistical analysis in the courtroom.

Why you should listen

Peter Donnelly applies statistical methods to real-world problems, ranging from DNA analysis (for criminal trials), to the treatment of genetic disorders. A mathematician who collaborates with biologists, he specializes in applying probability and statistics to the field of genetics, in hopes of shedding light on evolutionary history and the structure of the human genome.

The Australian-born, Oxford-based mathematician is best known for his work in molecular evolution (tracing the roots of human existence to their earliest origins using the mutation rates of mitochondrial DNA). He studies genetic distributions in living populations to trace human evolutionary history -- an approach that informs research in evolutionary biology, as well as medical treatment for genetic disorders. Donnelly is a key player in the International HapMap Project, an ongoing international effort to model human genetic variation and pinpoint the genes responsible for specific aspects of health and disease; its implications for disease prevention and treatment are vast.

He's also a leading expert on DNA analysis and the use of forensic science in criminal trials; he's an outspoken advocate for bringing sensible statistical analysis into the courtroom. Donnelly leads Oxford University's Mathematical Genetics Group, which conducts research in genetic modeling, human evolutionary history, and forensic DNA profiling. He is also serves as Director of the Wellcome Trust Centre for Human Genetics at Oxford University, which explores the genetic relationships to disease and illness.

More profile about the speaker
Peter Donnelly | Speaker | TED.com

TEDGlobal 2005

Peter Donnelly: How juries are fooled by statistics

Filmed: 2005-07-14

Readability: 3.6

1,279,860 views

Oxford mathematician Peter Donnelly reveals the common mistakes humans make in interpreting statistics -- and the devastating impact these errors can have on the outcome of criminal trials.

Peter Donnelly - Mathematician; statistician
Peter Donnelly is an expert in probability theory who applies statistical methods to genetic data -- spurring advances in disease treatment and insight on our evolution. He's also an expert on DNA analysis, and an advocate for sensible statistical analysis in the courtroom. Full bio

Double-click the English transcript below to play the video.

00:25

As other speakers have said, it's a rather daunting experience --

0

0

2000

00:27

a particularly daunting experience -- to be speaking in front of this audience.

1

2000

3000

00:30

But unlike the other speakers, I'm not going to tell you about

2

5000

3000

00:33

the mysteries of the universe, or the wonders of evolution,

3

8000

2000

00:35

or the really clever, innovative ways people are attacking

4

10000

4000

00:39

the major inequalities in our world.

5

14000

2000

00:41

Or even the challenges of nation-states in the modern global economy.

6

16000

5000

00:46

My brief, as you've just heard, is to tell you about statistics --

7

21000

4000

00:50

and, to be more precise, to tell you some exciting things about statistics.

8

25000

3000

00:53

And that's --

9

28000

1000

00:54

(Laughter)

10

29000

1000

00:55

-- that's rather more challenging

11

30000

2000

00:57

than all the speakers before me and all the ones coming after me.

12

32000

2000

00:59

(Laughter)

13

34000

1000

01:01

One of my senior colleagues told me, when I was a youngster in this profession,

14

36000

5000

01:06

rather proudly, that statisticians were people who liked figures

15

41000

4000

01:10

but didn't have the personality skills to become accountants.

16

45000

3000

01:13

(Laughter)

17

48000

2000

01:15

And there's another in-joke among statisticians, and that's,

18

50000

3000

01:18

"How do you tell the introverted statistician from the extroverted statistician?"

19

53000

3000

01:21

To which the answer is,

20

56000

2000

01:23

"The extroverted statistician's the one who looks at the other person's shoes."

21

58000

5000

01:28

(Laughter)

22

63000

3000

01:31

But I want to tell you something useful -- and here it is, so concentrate now.

23

66000

5000

01:36

This evening, there's a reception in the University's Museum of Natural History.

24

71000

3000

01:39

And it's a wonderful setting, as I hope you'll find,

25

74000

2000

01:41

and a great icon to the best of the Victorian tradition.

26

76000

5000

01:46

It's very unlikely -- in this special setting, and this collection of people --

27

81000

5000

01:51

but you might just find yourself talking to someone you'd rather wish that you weren't.

28

86000

3000

01:54

So here's what you do.

29

89000

2000

01:56

When they say to you, "What do you do?" -- you say, "I'm a statistician."

30

91000

4000

02:00

(Laughter)

31

95000

1000

02:01

Well, except they've been pre-warned now, and they'll know you're making it up.

32

96000

4000

02:05

And then one of two things will happen.

33

100000

2000

02:07

They'll either discover their long-lost cousin in the other corner of the room

34

102000

2000

02:09

and run over and talk to them.

35

104000

2000

02:11

Or they'll suddenly become parched and/or hungry -- and often both --

36

106000

3000

02:14

and sprint off for a drink and some food.

37

109000

2000

02:16

And you'll be left in peace to talk to the person you really want to talk to.

38

111000

4000

02:20

It's one of the challenges in our profession to try and explain what we do.

39

115000

3000

02:23

We're not top on people's lists for dinner party guests and conversations and so on.

40

118000

5000

02:28

And it's something I've never really found a good way of doing.

41

123000

2000

02:30

But my wife -- who was then my girlfriend --

42

125000

3000

02:33

managed it much better than I've ever been able to.

43

128000

3000

02:36

Many years ago, when we first started going out, she was working for the BBC in Britain,

44

131000

3000

02:39

and I was, at that stage, working in America.

45

134000

2000

02:41

I was coming back to visit her.

46

136000

2000

02:43

She told this to one of her colleagues, who said, "Well, what does your boyfriend do?"

47

138000

6000

02:49

Sarah thought quite hard about the things I'd explained --

48

144000

2000

02:51

and she concentrated, in those days, on listening.

49

146000

4000

02:55

(Laughter)

50

150000

2000

02:58

Don't tell her I said that.

51

153000

2000

03:00

And she was thinking about the work I did developing mathematical models

52

155000

4000

03:04

for understanding evolution and modern genetics.

53

159000

3000

03:07

So when her colleague said, "What does he do?"

54

162000

3000

03:10

She paused and said, "He models things."

55

165000

4000

03:14

(Laughter)

56

169000

1000

03:15

Well, her colleague suddenly got much more interested than I had any right to expect

57

170000

4000

03:19

and went on and said, "What does he model?"

58

174000

3000

03:22

Well, Sarah thought a little bit more about my work and said, "Genes."

59

177000

3000

03:25

(Laughter)

60

180000

4000

03:29

"He models genes."

61

184000

2000

03:31

That is my first love, and that's what I'll tell you a little bit about.

62

186000

4000

03:35

What I want to do more generally is to get you thinking about

63

190000

4000

03:39

the place of uncertainty and randomness and chance in our world,

64

194000

3000

03:42

and how we react to that, and how well we do or don't think about it.

65

197000

5000

03:47

So you've had a pretty easy time up till now --

66

202000

2000

03:49

a few laughs, and all that kind of thing -- in the talks to date.

67

204000

2000

03:51

You've got to think, and I'm going to ask you some questions.

68

206000

3000

03:54

So here's the scene for the first question I'm going to ask you.

69

209000

2000

03:56

Can you imagine tossing a coin successively?

70

211000

3000

03:59

And for some reason -- which shall remain rather vague --

71

214000

3000

04:02

we're interested in a particular pattern.

72

217000

2000

04:04

Here's one -- a head, followed by a tail, followed by a tail.

73

219000

3000

04:07

So suppose we toss a coin repeatedly.

74

222000

3000

04:10

Then the pattern, head-tail-tail, that we've suddenly become fixated with happens here.

75

225000

5000

04:15

And you can count: one, two, three, four, five, six, seven, eight, nine, 10 --

76

230000

4000

04:19

it happens after the 10th toss.

77

234000

2000

04:21

So you might think there are more interesting things to do, but humor me for the moment.

78

236000

3000

04:24

Imagine this half of the audience each get out coins, and they toss them

79

239000

4000

04:28

until they first see the pattern head-tail-tail.

80

243000

3000

04:31

The first time they do it, maybe it happens after the 10th toss, as here.

81

246000

2000

04:33

The second time, maybe it's after the fourth toss.

82

248000

2000

04:35

The next time, after the 15th toss.

83

250000

2000

04:37

So you do that lots and lots of times, and you average those numbers.

84

252000

3000

04:40

That's what I want this side to think about.

85

255000

3000

04:43

The other half of the audience doesn't like head-tail-tail --

86

258000

2000

04:45

they think, for deep cultural reasons, that's boring --

87

260000

3000

04:48

and they're much more interested in a different pattern -- head-tail-head.

88

263000

3000

04:51

So, on this side, you get out your coins, and you toss and toss and toss.

89

266000

3000

04:54

And you count the number of times until the pattern head-tail-head appears

90

269000

3000

04:57

and you average them. OK?

91

272000

3000

05:00

So on this side, you've got a number --

92

275000

2000

05:02

you've done it lots of times, so you get it accurately --

93

277000

2000

05:04

which is the average number of tosses until head-tail-tail.

94

279000

3000

05:07

On this side, you've got a number -- the average number of tosses until head-tail-head.

95

282000

4000

05:11

So here's a deep mathematical fact --

96

286000

2000

05:13

if you've got two numbers, one of three things must be true.

97

288000

3000

05:16

Either they're the same, or this one's bigger than this one,

98

291000

3000

05:19

or this one's bigger than that one.

99

294000

1000

05:20

So what's going on here?

100

295000

3000

05:23

So you've all got to think about this, and you've all got to vote --

101

298000

2000

05:25

and we're not moving on.

102

300000

1000

05:26

And I don't want to end up in the two-minute silence

103

301000

2000

05:28

to give you more time to think about it, until everyone's expressed a view. OK.

104

303000

4000

05:32

So what you want to do is compare the average number of tosses until we first see

105

307000

4000

05:36

head-tail-head with the average number of tosses until we first see head-tail-tail.

106

311000

4000

05:41

Who thinks that A is true --

107

316000

2000

05:43

that, on average, it'll take longer to see head-tail-head than head-tail-tail?

108

318000

4000

05:47

Who thinks that B is true -- that on average, they're the same?

109

322000

3000

05:51

Who thinks that C is true -- that, on average, it'll take less time

110

326000

2000

05:53

to see head-tail-head than head-tail-tail?

111

328000

3000

05:57

OK, who hasn't voted yet? Because that's really naughty -- I said you had to.

112

332000

3000

06:00

(Laughter)

113

335000

1000

06:02

OK. So most people think B is true.

114

337000

3000

06:05

And you might be relieved to know even rather distinguished mathematicians think that.

115

340000

3000

06:08

It's not. A is true here.

116

343000

4000

06:12

It takes longer, on average.

117

347000

2000

06:14

In fact, the average number of tosses till head-tail-head is 10

118

349000

2000

06:16

and the average number of tosses until head-tail-tail is eight.

119

351000

5000

06:21

How could that be?

120

356000

2000

06:24

Anything different about the two patterns?

121

359000

3000

06:30

There is. Head-tail-head overlaps itself.

122

365000

5000

06:35

If you went head-tail-head-tail-head, you can cunningly get two occurrences

123

370000

4000

06:39

of the pattern in only five tosses.

124

374000

3000

06:42

You can't do that with head-tail-tail.

125

377000

2000

06:44

That turns out to be important.

126

379000

2000

06:46

There are two ways of thinking about this.

127

381000

2000

06:48

I'll give you one of them.

128

383000

2000

06:50

So imagine -- let's suppose we're doing it.

129

385000

2000

06:52

On this side -- remember, you're excited about head-tail-tail;

130

387000

2000

06:54

you're excited about head-tail-head.

131

389000

2000

06:56

We start tossing a coin, and we get a head --

132

391000

3000

06:59

and you start sitting on the edge of your seat

133

394000

1000

07:00

because something great and wonderful, or awesome, might be about to happen.

134

395000

5000

07:05

The next toss is a tail -- you get really excited.

135

400000

2000

07:07

The champagne's on ice just next to you; you've got the glasses chilled to celebrate.

136

402000

4000

07:11

You're waiting with bated breath for the final toss.

137

406000

2000

07:13

And if it comes down a head, that's great.

138

408000

2000

07:15

You're done, and you celebrate.

139

410000

2000

07:17

If it's a tail -- well, rather disappointedly, you put the glasses away

140

412000

2000

07:19

and put the champagne back.

141

414000

2000

07:21

And you keep tossing, to wait for the next head, to get excited.

142

416000

3000

07:25

On this side, there's a different experience.

143

420000

2000

07:27

It's the same for the first two parts of the sequence.

144

422000

3000

07:30

You're a little bit excited with the first head --

145

425000

2000

07:32

you get rather more excited with the next tail.

146

427000

2000

07:34

Then you toss the coin.

147

429000

2000

07:36

If it's a tail, you crack open the champagne.

148

431000

3000

07:39

If it's a head you're disappointed,

149

434000

2000

07:41

but you're still a third of the way to your pattern again.

150

436000

3000

07:44

And that's an informal way of presenting it -- that's why there's a difference.

151

439000

4000

07:48

Another way of thinking about it --

152

443000

2000

07:50

if we tossed a coin eight million times,

153

445000

2000

07:52

then we'd expect a million head-tail-heads

154

447000

2000

07:54

and a million head-tail-tails -- but the head-tail-heads could occur in clumps.

155

449000

7000

08:01

So if you want to put a million things down amongst eight million positions

156

456000

2000

08:03

and you can have some of them overlapping, the clumps will be further apart.

157

458000

5000

08:08

It's another way of getting the intuition.

158

463000

2000

08:10

What's the point I want to make?

159

465000

2000

08:12

It's a very, very simple example, an easily stated question in probability,

160

467000

4000

08:16

which every -- you're in good company -- everybody gets wrong.

161

471000

3000

08:19

This is my little diversion into my real passion, which is genetics.

162

474000

4000

08:23

There's a connection between head-tail-heads and head-tail-tails in genetics,

163

478000

3000

08:26

and it's the following.

164

481000

3000

08:29

When you toss a coin, you get a sequence of heads and tails.

165

484000

3000

08:32

When you look at DNA, there's a sequence of not two things -- heads and tails --

166

487000

3000

08:35

but four letters -- As, Gs, Cs and Ts.

167

490000

3000

08:38

And there are little chemical scissors, called restriction enzymes

168

493000

3000

08:41

which cut DNA whenever they see particular patterns.

169

496000

2000

08:43

And they're an enormously useful tool in modern molecular biology.

170

498000

4000

08:48

And instead of asking the question, "How long until I see a head-tail-head?" --

171

503000

3000

08:51

you can ask, "How big will the chunks be when I use a restriction enzyme

172

506000

3000

08:54

which cuts whenever it sees G-A-A-G, for example?

173

509000

4000

08:58

How long will those chunks be?"

174

513000

2000

09:00

That's a rather trivial connection between probability and genetics.

175

515000

5000

09:05

There's a much deeper connection, which I don't have time to go into

176

520000

3000

09:08

and that is that modern genetics is a really exciting area of science.

177

523000

3000

09:11

And we'll hear some talks later in the conference specifically about that.

178

526000

4000

09:15

But it turns out that unlocking the secrets in the information generated by modern

179

530000

4000

09:19

experimental technologies, a key part of that has to do with fairly sophisticated --

180

534000

5000

09:24

you'll be relieved to know that I do something useful in my day job,

181

539000

3000

09:27

rather more sophisticated than the head-tail-head story --

182

542000

2000

09:29

but quite sophisticated computer modelings and mathematical modelings

183

544000

4000

09:33

and modern statistical techniques.

184

548000

2000

09:35

And I will give you two little snippets -- two examples --

185

550000

3000

09:38

of projects we're involved in in my group in Oxford,

186

553000

3000

09:41

both of which I think are rather exciting.

187

556000

2000

09:43

You know about the Human Genome Project.

188

558000

2000

09:45

That was a project which aimed to read one copy of the human genome.

189

560000

4000

09:51

The natural thing to do after you've done that --

190

566000

2000

09:53

and that's what this project, the International HapMap Project,

191

568000

2000

09:55

which is a collaboration between labs in five or six different countries.

192

570000

5000

10:00

Think of the Human Genome Project as learning what we've got in common,

193

575000

4000

10:04

and the HapMap Project is trying to understand

194

579000

2000

10:06

where there are differences between different people.

195

581000

2000

10:08

Why do we care about that?

196

583000

2000

10:10

Well, there are lots of reasons.

197

585000

2000

10:12

The most pressing one is that we want to understand how some differences

198

587000

4000

10:16

make some people susceptible to one disease -- type-2 diabetes, for example --

199

591000

4000

10:20

and other differences make people more susceptible to heart disease,

200

595000

5000

10:25

or stroke, or autism and so on.

201

600000

2000

10:27

That's one big project.

202

602000

2000

10:29

There's a second big project,

203

604000

2000

10:31

recently funded by the Wellcome Trust in this country,

204

606000

2000

10:33

involving very large studies --

205

608000

2000

10:35

thousands of individuals, with each of eight different diseases,

206

610000

3000

10:38

common diseases like type-1 and type-2 diabetes, and coronary heart disease,

207

613000

4000

10:42

bipolar disease and so on -- to try and understand the genetics.

208

617000

4000

10:46

To try and understand what it is about genetic differences that causes the diseases.

209

621000

3000

10:49

Why do we want to do that?

210

624000

2000

10:51

Because we understand very little about most human diseases.

211

626000

3000

10:54

We don't know what causes them.

212

629000

2000

10:56

And if we can get in at the bottom and understand the genetics,

213

631000

2000

10:58

we'll have a window on the way the disease works,

214

633000

3000

11:01

and a whole new way about thinking about disease therapies

215

636000

2000

11:03

and preventative treatment and so on.

216

638000

3000

11:06

So that's, as I said, the little diversion on my main love.

217

641000

3000

11:09

Back to some of the more mundane issues of thinking about uncertainty.

218

644000

5000

11:14

Here's another quiz for you --

219

649000

2000

11:16

now suppose we've got a test for a disease

220

651000

2000

11:18

which isn't infallible, but it's pretty good.

221

653000

2000

11:20

It gets it right 99 percent of the time.

222

655000

3000

11:23

And I take one of you, or I take someone off the street,

223

658000

3000

11:26

and I test them for the disease in question.

224

661000

2000

11:28

Let's suppose there's a test for HIV -- the virus that causes AIDS --

225

663000

4000

11:32

and the test says the person has the disease.

226

667000

3000

11:35

What's the chance that they do?

227

670000

3000

11:38

The test gets it right 99 percent of the time.

228

673000

2000

11:40

So a natural answer is 99 percent.

229

675000

4000

11:44

Who likes that answer?

230

679000

2000

11:46

Come on -- everyone's got to get involved.

231

681000

1000

11:47

Don't think you don't trust me anymore.

232

682000

2000

11:49

(Laughter)

233

684000

1000

11:50

Well, you're right to be a bit skeptical, because that's not the answer.

234

685000

3000

11:53

That's what you might think.

235

688000

2000

11:55

It's not the answer, and it's not because it's only part of the story.

236

690000

3000

11:58

It actually depends on how common or how rare the disease is.

237

693000

3000

12:01

So let me try and illustrate that.

238

696000

2000

12:03

Here's a little caricature of a million individuals.

239

698000

4000

12:07

So let's think about a disease that affects --

240

702000

3000

12:10

it's pretty rare, it affects one person in 10,000.

241

705000

2000

12:12

Amongst these million individuals, most of them are healthy

242

707000

3000

12:15

and some of them will have the disease.

243

710000

2000

12:17

And in fact, if this is the prevalence of the disease,

244

712000

3000

12:20

about 100 will have the disease and the rest won't.

245

715000

3000

12:23

So now suppose we test them all.

246

718000

2000

12:25

What happens?

247

720000

2000

12:27

Well, amongst the 100 who do have the disease,

248

722000

2000

12:29

the test will get it right 99 percent of the time, and 99 will test positive.

249

724000

5000

12:34

Amongst all these other people who don't have the disease,

250

729000

2000

12:36

the test will get it right 99 percent of the time.

251

731000

3000

12:39

It'll only get it wrong one percent of the time.

252

734000

2000

12:41

But there are so many of them that there'll be an enormous number of false positives.

253

736000

4000

12:45

Put that another way --

254

740000

2000

12:47

of all of them who test positive -- so here they are, the individuals involved --

255

742000

5000

12:52

less than one in 100 actually have the disease.

256

747000

5000

12:57

So even though we think the test is accurate, the important part of the story is

257

752000

4000

13:01

there's another bit of information we need.

258

756000

3000

13:04

Here's the key intuition.

259

759000

2000

13:07

What we have to do, once we know the test is positive,

260

762000

3000

13:10

is to weigh up the plausibility, or the likelihood, of two competing explanations.

261

765000

6000

13:16

Each of those explanations has a likely bit and an unlikely bit.

262

771000

3000

13:19

One explanation is that the person doesn't have the disease --

263

774000

3000

13:22

that's overwhelmingly likely, if you pick someone at random --

264

777000

3000

13:25

but the test gets it wrong, which is unlikely.

265

780000

3000

13:29

The other explanation is that the person does have the disease -- that's unlikely --

266

784000

3000

13:32

but the test gets it right, which is likely.

267

787000

3000

13:35

And the number we end up with --

268

790000

2000

13:37

that number which is a little bit less than one in 100 --

269

792000

3000

13:40

is to do with how likely one of those explanations is relative to the other.

270

795000

6000

13:46

Each of them taken together is unlikely.

271

801000

2000

13:49

Here's a more topical example of exactly the same thing.

272

804000

3000

13:52

Those of you in Britain will know about what's become rather a celebrated case

273

807000

4000

13:56

of a woman called Sally Clark, who had two babies who died suddenly.

274

811000

5000

14:01

And initially, it was thought that they died of what's known informally as "cot death,"

275

816000

4000

14:05

and more formally as "Sudden Infant Death Syndrome."

276

820000

3000

14:08

For various reasons, she was later charged with murder.

277

823000

2000

14:10

And at the trial, her trial, a very distinguished pediatrician gave evidence

278

825000

4000

14:14

that the chance of two cot deaths, innocent deaths, in a family like hers --

279

829000

5000

14:19

which was professional and non-smoking -- was one in 73 million.

280

834000

6000

14:26

To cut a long story short, she was convicted at the time.

281

841000

3000

14:29

Later, and fairly recently, acquitted on appeal -- in fact, on the second appeal.

282

844000

5000

14:34

And just to set it in context, you can imagine how awful it is for someone

283

849000

4000

14:38

to have lost one child, and then two, if they're innocent,

284

853000

3000

14:41

to be convicted of murdering them.

285

856000

2000

14:43

To be put through the stress of the trial, convicted of murdering them --

286

858000

2000

14:45

and to spend time in a women's prison, where all the other prisoners

287

860000

3000

14:48

think you killed your children -- is a really awful thing to happen to someone.

288

863000

5000

14:53

And it happened in large part here because the expert got the statistics

289

868000

5000

14:58

horribly wrong, in two different ways.

290

873000

3000

15:01

So where did he get the one in 73 million number?

291

876000

4000

15:05

He looked at some research, which said the chance of one cot death in a family

292

880000

3000

15:08

like Sally Clark's is about one in 8,500.

293

883000

5000

15:13

So he said, "I'll assume that if you have one cot death in a family,

294

888000

4000

15:17

the chance of a second child dying from cot death aren't changed."

295

892000

4000

15:21

So that's what statisticians would call an assumption of independence.

296

896000

3000

15:24

It's like saying, "If you toss a coin and get a head the first time,

297

899000

2000

15:26

that won't affect the chance of getting a head the second time."

298

901000

3000

15:29

So if you toss a coin twice, the chance of getting a head twice are a half --

299

904000

5000

15:34

that's the chance the first time -- times a half -- the chance a second time.

300

909000

3000

15:37

So he said, "Here,

301

912000

2000

15:39

I'll assume that these events are independent.

302

914000

4000

15:43

When you multiply 8,500 together twice,

303

918000

2000

15:45

you get about 73 million."

304

920000

2000

15:47

And none of this was stated to the court as an assumption

305

922000

2000

15:49

or presented to the jury that way.

306

924000

2000

15:52

Unfortunately here -- and, really, regrettably --

307

927000

3000

15:55

first of all, in a situation like this you'd have to verify it empirically.

308

930000

4000

15:59

And secondly, it's palpably false.

309

934000

2000

16:02

There are lots and lots of things that we don't know about sudden infant deaths.

310

937000

5000

16:07

It might well be that there are environmental factors that we're not aware of,

311

942000

3000

16:10

and it's pretty likely to be the case that there are

312

945000

2000

16:12

genetic factors we're not aware of.

313

947000

2000

16:14

So if a family suffers from one cot death, you'd put them in a high-risk group.

314

949000

3000

16:17

They've probably got these environmental risk factors

315

952000

2000

16:19

and/or genetic risk factors we don't know about.

316

954000

3000

16:22

And to argue, then, that the chance of a second death is as if you didn't know

317

957000

3000

16:25

that information is really silly.

318

960000

3000

16:28

It's worse than silly -- it's really bad science.

319

963000

4000

16:32

Nonetheless, that's how it was presented, and at trial nobody even argued it.

320

967000

5000

16:37

That's the first problem.

321

972000

2000

16:39

The second problem is, what does the number of one in 73 million mean?

322

974000

4000

16:43

So after Sally Clark was convicted --

323

978000

2000

16:45

you can imagine, it made rather a splash in the press --

324

980000

4000

16:49

one of the journalists from one of Britain's more reputable newspapers wrote that

325

984000

7000

16:56

what the expert had said was,

326

991000

2000

16:58

"The chance that she was innocent was one in 73 million."

327

993000

5000

17:03

Now, that's a logical error.

328

998000

2000

17:05

It's exactly the same logical error as the logical error of thinking that

329

1000000

3000

17:08

after the disease test, which is 99 percent accurate,

330

1003000

2000

17:10

the chance of having the disease is 99 percent.

331

1005000

4000

17:14

In the disease example, we had to bear in mind two things,

332

1009000

4000

17:18

one of which was the possibility that the test got it right or not.

333

1013000

4000

17:22

And the other one was the chance, a priori, that the person had the disease or not.

334

1017000

4000

17:26

It's exactly the same in this context.

335

1021000

3000

17:29

There are two things involved -- two parts to the explanation.

336

1024000

4000

17:33

We want to know how likely, or relatively how likely, two different explanations are.

337

1028000

4000

17:37

One of them is that Sally Clark was innocent --

338

1032000

3000

17:40

which is, a priori, overwhelmingly likely --

339

1035000

2000

17:42

most mothers don't kill their children.

340

1037000

3000

17:45

And the second part of the explanation

341

1040000

2000

17:47

is that she suffered an incredibly unlikely event.

342

1042000

3000

17:50

Not as unlikely as one in 73 million, but nonetheless rather unlikely.

343

1045000

4000

17:54

The other explanation is that she was guilty.

344

1049000

2000

17:56

Now, we probably think a priori that's unlikely.

345

1051000

2000

17:58

And we certainly should think in the context of a criminal trial

346

1053000

3000

18:01

that that's unlikely, because of the presumption of innocence.

347

1056000

3000

18:04

And then if she were trying to kill the children, she succeeded.

348

1059000

4000

18:08

So the chance that she's innocent isn't one in 73 million.

349

1063000

4000

18:12

We don't know what it is.

350

1067000

2000

18:14

It has to do with weighing up the strength of the other evidence against her

351

1069000

4000

18:18

and the statistical evidence.

352

1073000

2000

18:20

We know the children died.

353

1075000

2000

18:22

What matters is how likely or unlikely, relative to each other,

354

1077000

4000

18:26

the two explanations are.

355

1081000

2000

18:28

And they're both implausible.

356

1083000

2000

18:31

There's a situation where errors in statistics had really profound

357

1086000

4000

18:35

and really unfortunate consequences.

358

1090000

3000

18:38

In fact, there are two other women who were convicted on the basis of the

359

1093000

2000

18:40

evidence of this pediatrician, who have subsequently been released on appeal.

360

1095000

4000

18:44

Many cases were reviewed.

361

1099000

2000

18:46

And it's particularly topical because he's currently facing a disrepute charge

362

1101000

4000

18:50

at Britain's General Medical Council.

363

1105000

3000

18:53

So just to conclude -- what are the take-home messages from this?

364

1108000

4000

18:57

Well, we know that randomness and uncertainty and chance

365

1112000

4000

19:01

are very much a part of our everyday life.

366

1116000

3000

19:04

It's also true -- and, although, you, as a collective, are very special in many ways,

367

1119000

5000

19:09

you're completely typical in not getting the examples I gave right.

368

1124000

4000

19:13

It's very well documented that people get things wrong.

369

1128000

3000

19:16

They make errors of logic in reasoning with uncertainty.

370

1131000

3000

19:20

We can cope with the subtleties of language brilliantly --

371

1135000

2000

19:22

and there are interesting evolutionary questions about how we got here.

372

1137000

3000

19:25

We are not good at reasoning with uncertainty.

373

1140000

3000

19:28

That's an issue in our everyday lives.

374

1143000

2000

19:30

As you've heard from many of the talks, statistics underpins an enormous amount

375

1145000

3000

19:33

of research in science -- in social science, in medicine

376

1148000

3000

19:36

and indeed, quite a lot of industry.

377

1151000

2000

19:38

All of quality control, which has had a major impact on industrial processing,

378

1153000

4000

19:42

is underpinned by statistics.

379

1157000

2000

19:44

It's something we're bad at doing.

380

1159000

2000

19:46

At the very least, we should recognize that, and we tend not to.

381

1161000

3000

19:49

To go back to the legal context, at the Sally Clark trial

382

1164000

4000

19:53

all of the lawyers just accepted what the expert said.

383

1168000

4000

19:57

So if a pediatrician had come out and said to a jury,

384

1172000

2000

19:59

"I know how to build bridges. I've built one down the road.

385

1174000

3000

20:02

Please drive your car home over it,"

386

1177000

2000

20:04

they would have said, "Well, pediatricians don't know how to build bridges.

387

1179000

2000

20:06

That's what engineers do."

388

1181000

2000

20:08

On the other hand, he came out and effectively said, or implied,

389

1183000

3000

20:11

"I know how to reason with uncertainty. I know how to do statistics."

390

1186000

3000

20:14

And everyone said, "Well, that's fine. He's an expert."

391

1189000

3000

20:17

So we need to understand where our competence is and isn't.

392

1192000

3000

20:20

Exactly the same kinds of issues arose in the early days of DNA profiling,

393

1195000

4000

20:24

when scientists, and lawyers and in some cases judges,

394

1199000

4000

20:28

routinely misrepresented evidence.

395

1203000

3000

20:32

Usually -- one hopes -- innocently, but misrepresented evidence.

396

1207000

3000

20:35

Forensic scientists said, "The chance that this guy's innocent is one in three million."

397

1210000

5000

20:40

Even if you believe the number, just like the 73 million to one,

398

1215000

2000

20:42

that's not what it meant.

399

1217000

2000

20:44

And there have been celebrated appeal cases

400

1219000

2000

20:46

in Britain and elsewhere because of that.

401

1221000

2000

20:48

And just to finish in the context of the legal system.

402

1223000

3000

20:51

It's all very well to say, "Let's do our best to present the evidence."

403

1226000

4000

20:55

But more and more, in cases of DNA profiling -- this is another one --

404

1230000

3000

20:58

we expect juries, who are ordinary people --

405

1233000

3000

21:01

and it's documented they're very bad at this --

406

1236000

2000

21:03

we expect juries to be able to cope with the sorts of reasoning that goes on.

407

1238000

4000

21:07

In other spheres of life, if people argued -- well, except possibly for politics --

408

1242000

5000

21:12

but in other spheres of life, if people argued illogically,

409

1247000

2000

21:14

we'd say that's not a good thing.

410

1249000

2000

21:16

We sort of expect it of politicians and don't hope for much more.

411

1251000

4000

21:20

In the case of uncertainty, we get it wrong all the time --

412

1255000

3000

21:23

and at the very least, we should be aware of that,

413

1258000

2000

21:25

and ideally, we might try and do something about it.

414

1260000

2000

21:27

Thanks very much.

415

1262000

1000

ABOUT THE SPEAKER

Peter Donnelly - Mathematician; statistician
Peter Donnelly is an expert in probability theory who applies statistical methods to genetic data -- spurring advances in disease treatment and insight on our evolution. He's also an expert on DNA analysis, and an advocate for sensible statistical analysis in the courtroom.

Why you should listen

Peter Donnelly applies statistical methods to real-world problems, ranging from DNA analysis (for criminal trials), to the treatment of genetic disorders. A mathematician who collaborates with biologists, he specializes in applying probability and statistics to the field of genetics, in hopes of shedding light on evolutionary history and the structure of the human genome.

The Australian-born, Oxford-based mathematician is best known for his work in molecular evolution (tracing the roots of human existence to their earliest origins using the mutation rates of mitochondrial DNA). He studies genetic distributions in living populations to trace human evolutionary history -- an approach that informs research in evolutionary biology, as well as medical treatment for genetic disorders. Donnelly is a key player in the International HapMap Project, an ongoing international effort to model human genetic variation and pinpoint the genes responsible for specific aspects of health and disease; its implications for disease prevention and treatment are vast.

He's also a leading expert on DNA analysis and the use of forensic science in criminal trials; he's an outspoken advocate for bringing sensible statistical analysis into the courtroom. Donnelly leads Oxford University's Mathematical Genetics Group, which conducts research in genetic modeling, human evolutionary history, and forensic DNA profiling. He is also serves as Director of the Wellcome Trust Centre for Human Genetics at Oxford University, which explores the genetic relationships to disease and illness.

More profile about the speaker
Peter Donnelly | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

Peter Donnelly: How juries are fooled by statistics | TED Talk | TED.com