ABOUT THE SPEAKER
Peter Donnelly - Mathematician; statistician
Peter Donnelly is an expert in probability theory who applies statistical methods to genetic data -- spurring advances in disease treatment and insight on our evolution. He's also an expert on DNA analysis, and an advocate for sensible statistical analysis in the courtroom.

Why you should listen

Peter Donnelly applies statistical methods to real-world problems, ranging from DNA analysis (for criminal trials), to the treatment of genetic disorders. A mathematician who collaborates with biologists, he specializes in applying probability and statistics to the field of genetics, in hopes of shedding light on evolutionary history and the structure of the human genome.

The Australian-born, Oxford-based mathematician is best known for his work in molecular evolution (tracing the roots of human existence to their earliest origins using the mutation rates of mitochondrial DNA). He studies genetic distributions in living populations to trace human evolutionary history -- an approach that informs research in evolutionary biology, as well as medical treatment for genetic disorders. Donnelly is a key player in the International HapMap Project, an ongoing international effort to model human genetic variation and pinpoint the genes responsible for specific aspects of health and disease; its implications for disease prevention and treatment are vast.

He's also a leading expert on DNA analysis and the use of forensic science in criminal trials; he's an outspoken advocate for bringing sensible statistical analysis into the courtroom. Donnelly leads Oxford University's Mathematical Genetics Group, which conducts research in genetic modeling, human evolutionary history, and forensic DNA profiling. He is also serves as Director of the Wellcome Trust Centre for Human Genetics at Oxford University, which explores the genetic relationships to disease and illness. 

More profile about the speaker
Peter Donnelly | Speaker | TED.com
TEDGlobal 2005

Peter Donnelly: How juries are fooled by statistics

彼得•多纳利揭示统计数据是如何迷惑陪审团的

Filmed:
1,279,860 views

来自牛津大学的数学家,彼得•多纳利揭示了人们在理解数据时通常会犯的错误,以及这些错误会给审判罪犯的结果带来多么深重的影响
- Mathematician; statistician
Peter Donnelly is an expert in probability theory who applies statistical methods to genetic data -- spurring advances in disease treatment and insight on our evolution. He's also an expert on DNA analysis, and an advocate for sensible statistical analysis in the courtroom. Full bio

Double-click the English transcript below to play the video.

00:25
As other speakers音箱 have said, it's a rather daunting艰巨 experience经验 --
0
0
2000
正如一些演讲者所说 在这里的观众面前演讲
00:27
a particularly尤其 daunting艰巨 experience经验 -- to be speaking请讲 in front面前 of this audience听众.
1
2000
3000
是一次令人畏缩的经历--相当令人恐慌
00:30
But unlike不像 the other speakers音箱, I'm not going to tell you about
2
5000
3000
不过与其他演讲者不同 我不会给大家讲
00:33
the mysteries奥秘 of the universe宇宙, or the wonders奇迹 of evolution演化,
3
8000
2000
宇宙的迷团 也不会讲进化的奥妙
00:35
or the really clever聪明, innovative创新 ways方法 people are attacking进攻
4
10000
4000
抑或是人们用来对抗世界上主要的不平等现象的
00:39
the major重大的 inequalities不平等 in our world世界.
5
14000
2000
那些着实非常奇妙新颖的办法
00:41
Or even the challenges挑战 of nation-states民族国家 in the modern现代 global全球 economy经济.
6
16000
5000
更不会讲现代全球经济下国家之间的挑战
00:46
My brief简要, as you've just heard听说, is to tell you about statistics统计 --
7
21000
4000
就像你们刚才听到的 概括来说 我讲的内容是统计学--
00:50
and, to be more precise精确, to tell you some exciting扣人心弦 things about statistics统计.
8
25000
3000
更确切地说 是一些统计学中很有趣的事情
00:53
And that's --
9
28000
1000
而这--
00:54
(Laughter笑声)
10
29000
1000
(笑)
00:55
-- that's rather more challenging具有挑战性的
11
30000
2000
--相对所有在我之前以及之后的演讲者而言
00:57
than all the speakers音箱 before me and all the ones那些 coming未来 after me.
12
32000
2000
具有空前绝后的挑战性
00:59
(Laughter笑声)
13
34000
1000
(笑)
01:01
One of my senior前辈 colleagues同事 told me, when I was a youngster青少年 in this profession职业,
14
36000
5000
当我在统计学这个领域还是新人的时候 一个资深同事相当自豪地告诉我
01:06
rather proudly傲然, that statisticians统计学家 were people who liked喜欢 figures人物
15
41000
4000
统计学家是那些喜欢数字
01:10
but didn't have the personality个性 skills技能 to become成为 accountants会计师.
16
45000
3000
但性格上不适合做会计的人
01:13
(Laughter笑声)
17
48000
2000
(笑)
01:15
And there's another另一个 in-joke在玩笑 among其中 statisticians统计学家, and that's,
18
50000
3000
还有一个统计学的笑话
01:18
"How do you tell the introverted内敛 statistician统计员 from the extroverted外放 statistician统计员?"
19
53000
3000
“怎样看出统计学家是内向还是外向呢?”
01:21
To which哪一个 the answer回答 is,
20
56000
2000
答案就是
01:23
"The extroverted外放 statistician's统计学家的 the one who looks容貌 at the other person's人的 shoes."
21
58000
5000
“外向的统计学家会看别人的鞋”
01:28
(Laughter笑声)
22
63000
3000
(笑)
01:31
But I want to tell you something useful有用 -- and here it is, so concentrate集中 now.
23
66000
5000
不过其实我想讲一些有用的--所以请注意
01:36
This evening晚间, there's a reception招待会 in the University's大学的 Museum博物馆 of Natural自然 History历史.
24
71000
3000
今晚在学校的自然历史博物馆里有一个招待会
01:39
And it's a wonderful精彩 setting设置, as I hope希望 you'll你会 find,
25
74000
2000
希望你能发现 这是一个绝妙的场合
01:41
and a great icon图标 to the best最好 of the Victorian维多利亚时代 tradition传统.
26
76000
5000
也是维多利亚优秀传统中的表现
01:46
It's very unlikely不会 -- in this special特别 setting设置, and this collection采集 of people --
27
81000
5000
在这样的场合 这样的人群中 虽然有点不大可能
01:51
but you might威力 just find yourself你自己 talking to someone有人 you'd rather wish希望 that you weren't.
28
86000
3000
但你也许仍然发现你在跟一些你并不想聊天的人交谈
01:54
So here's这里的 what you do.
29
89000
2000
这时候你就可以这么做
01:56
When they say to you, "What do you do?" -- you say, "I'm a statistician统计员."
30
91000
4000
当他们问:“你的工作是?”--你就说:“我是统计学家”
02:00
(Laughter笑声)
31
95000
1000
(笑)
02:01
Well, except they've他们已经 been pre-warned预先警告 now, and they'll他们会 know you're making制造 it up.
32
96000
4000
除非他们事先得到提醒 知道这是你编的
02:05
And then one of two things will happen发生.
33
100000
2000
一般出现的情形都不过以下两种
02:07
They'll他们会 either discover发现 their long-lost久未 cousin表姐 in the other corner of the room房间
34
102000
2000
他们会突然在屋子另一角发现了失散多年的表亲
02:09
and run over and talk to them.
35
104000
2000
然后赶去跟他们说话
02:11
Or they'll他们会 suddenly突然 become成为 parched and/or hungry饥饿 -- and often经常 both --
36
106000
3000
或者他们会突然很渴或者很饿--通常是饥渴交迫--
02:14
and sprint短跑 off for a drink and some food餐饮.
37
109000
2000
然后奔向食物和饮料
02:16
And you'll你会 be left in peace和平 to talk to the person you really want to talk to.
38
111000
4000
这是你就能一个人静下来 跟你想聊天的人交谈
02:20
It's one of the challenges挑战 in our profession职业 to try and explain说明 what we do.
39
115000
3000
解释我们到底是做什么的 是我们这个领域的一个挑战
02:23
We're not top最佳 on people's人们 lists名单 for dinner晚餐 party派对 guests宾客 and conversations对话 and so on.
40
118000
5000
我们并不是晚宴的贵宾 也不是理想的交谈对象
02:28
And it's something I've never really found发现 a good way of doing.
41
123000
2000
对此我也一直没能找到什么好的解决办法
02:30
But my wife妻子 -- who was then my girlfriend女朋友 --
42
125000
3000
但我的妻子--当时是我的女朋友
02:33
managed管理 it much better than I've ever been able能够 to.
43
128000
3000
在这件事上就比我出色的多
02:36
Many许多 years年份 ago, when we first started开始 going out, she was working加工 for the BBCBBC in Britain英国,
44
131000
3000
多年前 那时我们刚开始约会 她在英国BBC工作
02:39
and I was, at that stage阶段, working加工 in America美国.
45
134000
2000
而我当时在美国
02:41
I was coming未来 back to visit访问 her.
46
136000
2000
我回英国看她的时候
02:43
She told this to one of her colleagues同事, who said, "Well, what does your boyfriend男朋友 do?"
47
138000
6000
她跟一个同事说起这事 那个同事问:“你男朋友是做什么的?”
02:49
Sarah莎拉 thought quite相当 hard about the things I'd explained解释 --
48
144000
2000
她苦苦思索着我刚才解释过的工作
02:51
and she concentrated集中, in those days, on listening.
49
146000
4000
于是那段时间她一直是一个专心的倾听者
02:55
(Laughter笑声)
50
150000
2000
(笑)
02:58
Don't tell her I said that.
51
153000
2000
别告诉她我跟说过这事
03:00
And she was thinking思维 about the work I did developing发展 mathematical数学的 models楷模
52
155000
4000
她当时想 我的工作是建立数模
03:04
for understanding理解 evolution演化 and modern现代 genetics遗传学.
53
159000
3000
来加深对进化和现代基因学的了解
03:07
So when her colleague同事 said, "What does he do?"
54
162000
3000
所以当同事问:“他是干什么的?”
03:10
She paused暂停 and said, "He models楷模 things."
55
165000
4000
她就停顿一下 然后说:“他做模型。”
03:14
(Laughter笑声)
56
169000
1000
(笑)
03:15
Well, her colleague同事 suddenly突然 got much more interested有兴趣 than I had any right to expect期望
57
170000
4000
当然 她的同事立即就对我产生了出乎我意料的兴趣
03:19
and went on and said, "What does he model模型?"
58
174000
3000
并继续问:“他做什么模型?”
03:22
Well, Sarah莎拉 thought a little bit more about my work and said, "Genes基因."
59
177000
3000
然后 萨拉又想了想我的工作 然后答:“基因。”
03:25
(Laughter笑声)
60
180000
4000
(笑)
03:29
"He models楷模 genes基因."
61
184000
2000
“他建立基因模型。”
03:31
That is my first love, and that's what I'll tell you a little bit about.
62
186000
4000
这就是我的初恋 题外话了
03:35
What I want to do more generally通常 is to get you thinking思维 about
63
190000
4000
总的来说 我要给大家讲一些
03:39
the place地点 of uncertainty不确定 and randomness随机性 and chance机会 in our world世界,
64
194000
3000
不确定性、随机性和概率在生活中的影响
03:42
and how we react应对 to that, and how well we do or don't think about it.
65
197000
5000
我们对此的反应是怎样的 以及我们了解他们的程度
03:47
So you've had a pretty漂亮 easy简单 time up till直到 now --
66
202000
2000
到现在为止大家听得都很轻松
03:49
a few少数 laughs, and all that kind of thing -- in the talks会谈 to date日期.
67
204000
2000
到现在为止都是听听笑笑
03:51
You've got to think, and I'm going to ask you some questions问题.
68
206000
3000
现在大家要开始思考了 我会提几个问题
03:54
So here's这里的 the scene现场 for the first question I'm going to ask you.
69
209000
2000
下面这个场景就是我开始问第一个问题
03:56
Can you imagine想像 tossing折腾 a coin硬币 successively依次?
70
211000
3000
想象连续掷硬币的情形
03:59
And for some reason原因 -- which哪一个 shall remain rather vague模糊 --
71
214000
3000
由于某种原因--我就暂时不做过多的解释了--
04:02
we're interested有兴趣 in a particular特定 pattern模式.
72
217000
2000
我们很喜欢某种特定的情形
04:04
Here's这里的 one -- a head, followed其次 by a tail尾巴, followed其次 by a tail尾巴.
73
219000
3000
比如这个--正面、反面、正面
04:07
So suppose假设 we toss折腾 a coin硬币 repeatedly反复.
74
222000
3000
假设我们连续掷硬币
04:10
Then the pattern模式, head-tail-tail头 - 尾 - 尾, that we've我们已经 suddenly突然 become成为 fixated迷恋 with happens发生 here.
75
225000
5000
然后我们设定这样一个情形 正反反
04:15
And you can count计数: one, two, three, four, five, six, seven, eight, nine, 10 --
76
230000
4000
数着掷十次:一 二 三 四 五 六 七 八 九 十
04:19
it happens发生 after the 10th toss折腾.
77
234000
2000
然后看结果怎么样
04:21
So you might威力 think there are more interesting有趣 things to do, but humor幽默 me for the moment时刻.
78
236000
3000
你可能觉得还有更有趣的事可以做 不过这次先迁就我一下
04:24
Imagine想像 this half of the audience听众 each get out coins硬币, and they toss折腾 them
79
239000
4000
假设这半边观众都拿出硬币开始投掷
04:28
until直到 they first see the pattern模式 head-tail-tail头 - 尾 - 尾.
80
243000
3000
直到他们看到正反反现象为止
04:31
The first time they do it, maybe it happens发生 after the 10th toss折腾, as here.
81
246000
2000
第一回投硬币 也许十次以后才能看到
04:33
The second第二 time, maybe it's after the fourth第四 toss折腾.
82
248000
2000
第二回 也许第四次就能看到
04:35
The next下一个 time, after the 15th toss折腾.
83
250000
2000
再下一回 也许比15次还多
04:37
So you do that lots and lots of times, and you average平均 those numbers数字.
84
252000
3000
做过很多遍这个实验后 将每遍的次数平均
04:40
That's what I want this side to think about.
85
255000
3000
这就是我想让这半边思考的情况
04:43
The other half of the audience听众 doesn't like head-tail-tail头 - 尾 - 尾 --
86
258000
2000
那半边观众不喜欢正反反
04:45
they think, for deep cultural文化 reasons原因, that's boring无聊 --
87
260000
3000
出于某些深刻的文化因素 他们觉得这很无聊--
04:48
and they're much more interested有兴趣 in a different不同 pattern模式 -- head-tail-head头 - 尾 - 头.
88
263000
3000
他们跟更喜欢另一种情形--正反正
04:51
So, on this side, you get out your coins硬币, and you toss折腾 and toss折腾 and toss折腾.
89
266000
3000
所以 这半边的观众拿出硬币 反复投掷
04:54
And you count计数 the number of times until直到 the pattern模式 head-tail-head头 - 尾 - 头 appears出现
90
269000
3000
然后记下看到正反正情形出现时掷硬币的次数
04:57
and you average平均 them. OK?
91
272000
3000
然后将所有的次数平均
05:00
So on this side, you've got a number --
92
275000
2000
那么 这半边的观众得出了一个平均数
05:02
you've doneDONE it lots of times, so you get it accurately准确 --
93
277000
2000
因为做了很多次 所以这个数字是准确的
05:04
which哪一个 is the average平均 number of tosses until直到 head-tail-tail头 - 尾 - 尾.
94
279000
3000
就是正反反情形出现时投掷硬币次数的平均
05:07
On this side, you've got a number -- the average平均 number of tosses until直到 head-tail-head头 - 尾 - 头.
95
282000
4000
而这半边的观众 大家也得出了一个数字--正反正情形的平均
05:11
So here's这里的 a deep mathematical数学的 fact事实 --
96
286000
2000
那么就有了这样一个数学问题
05:13
if you've got two numbers数字, one of three things must必须 be true真正.
97
288000
3000
两个数之间只能有三种情形
05:16
Either they're the same相同, or this one's那些 bigger than this one,
98
291000
3000
他们或者相等 或者这个比那个大
05:19
or this one's那些 bigger than that one.
99
294000
1000
或者那个比这个大
05:20
So what's going on here?
100
295000
3000
那么在我们这两种情形下这两个数相比会怎样呢
05:23
So you've all got to think about this, and you've all got to vote投票 --
101
298000
2000
大家来思考一下 然后投个票
05:25
and we're not moving移动 on.
102
300000
1000
现在给大家一些时间
05:26
And I don't want to end结束 up in the two-minute两分钟 silence安静
103
301000
2000
不过我不想因为给大家更多的时间思考直到每个人都立场明确
05:28
to give you more time to think about it, until直到 everyone's大家的 expressed表达 a view视图. OK.
104
303000
4000
而最后以两分钟沉默告终
05:32
So what you want to do is compare比较 the average平均 number of tosses until直到 we first see
105
307000
4000
所以你们要做的只是比较这两种情形下
05:36
head-tail-head头 - 尾 - 头 with the average平均 number of tosses until直到 we first see head-tail-tail头 - 尾 - 尾.
106
311000
4000
平均数的大小
05:41
Who thinks that A is true真正 --
107
316000
2000
哪些认为A是对的--
05:43
that, on average平均, it'll它会 take longer to see head-tail-head头 - 尾 - 头 than head-tail-tail头 - 尾 - 尾?
108
318000
4000
即 平均来看 出现正反正的情形要晚于正反反情形?
05:47
Who thinks that B is true真正 -- that on average平均, they're the same相同?
109
322000
3000
哪些认为B是对的--即 平均来看次数相同?
05:51
Who thinks that C is true真正 -- that, on average平均, it'll它会 take less time
110
326000
2000
哪些认为C是对的--即 平均来看 出现正反正情形的次数
05:53
to see head-tail-head头 - 尾 - 头 than head-tail-tail头 - 尾 - 尾?
111
328000
3000
要少于正反反的情形?
05:57
OK, who hasn't有没有 voted yet然而? Because that's really naughty淘气 -- I said you had to.
112
332000
3000
好 谁没有投票? 那真是很调皮--我说过你们要选择一个
06:00
(Laughter笑声)
113
335000
1000
(笑)
06:02
OK. So most people think B is true真正.
114
337000
3000
好的 那么大多数人认为B是正确的
06:05
And you might威力 be relieved安心 to know even rather distinguished杰出的 mathematicians数学家 think that.
115
340000
3000
也许当听到甚至非常优秀的数学家也是这么想的 你会放下心来
06:08
It's not. A is true真正 here.
116
343000
4000
B不正确 答案是A
06:12
It takes longer, on average平均.
117
347000
2000
实际上 平均起来
06:14
In fact事实, the average平均 number of tosses till直到 head-tail-head头 - 尾 - 头 is 10
118
349000
2000
正反正情形下掷硬币的次数是10次
06:16
and the average平均 number of tosses until直到 head-tail-tail头 - 尾 - 尾 is eight.
119
351000
5000
而正反反情形的次数是8次
06:21
How could that be?
120
356000
2000
怎么会这样呢
06:24
Anything different不同 about the two patterns模式?
121
359000
3000
这两种情形有什么不同吗
06:30
There is. Head-tail-head头尾头 overlaps重叠 itself本身.
122
365000
5000
二者的确不同 正反正情形会自我重叠
06:35
If you went head-tail-head-tail-head头 - 尾 - 头 - 尾 - 头, you can cunningly狡猾 get two occurrences事件
123
370000
4000
如果你掷出正-反-正-反-正 你能在这五次中
06:39
of the pattern模式 in only five tosses.
124
374000
3000
看到两次正反正的情形
06:42
You can't do that with head-tail-tail头 - 尾 - 尾.
125
377000
2000
而这在正反反的情形下无法实现
06:44
That turns out to be important重要.
126
379000
2000
这一点变得很重要
06:46
There are two ways方法 of thinking思维 about this.
127
381000
2000
有两种方法可以来想这个问题
06:48
I'll give you one of them.
128
383000
2000
我提供其中之一
06:50
So imagine想像 -- let's suppose假设 we're doing it.
129
385000
2000
假设我们正在进行这个实验
06:52
On this side -- remember记得, you're excited兴奋 about head-tail-tail头 - 尾 - 尾;
130
387000
2000
这半边观众--记住 你们希望看到正反反
06:54
you're excited兴奋 about head-tail-head头 - 尾 - 头.
131
389000
2000
而你们希望看到正反正
06:56
We start开始 tossing折腾 a coin硬币, and we get a head --
132
391000
3000
我们开始投硬币 第一次是正
06:59
and you start开始 sitting坐在 on the edge边缘 of your seat座位
133
394000
1000
大家都开始暗自激动
07:00
because something great and wonderful精彩, or awesome真棒, might威力 be about to happen发生.
134
395000
5000
因为一个美妙绝伦的事情要发生了
07:05
The next下一个 toss折腾 is a tail尾巴 -- you get really excited兴奋.
135
400000
2000
第二次是反--大家都很激动
07:07
The champagne's香槟的 on ice just next下一个 to you; you've got the glasses眼镜 chilled to celebrate庆祝.
136
402000
4000
手边的香槟已经冰好 大家都拿着杯子开始准备庆祝
07:11
You're waiting等候 with bated屏息以待 breath呼吸 for the final最后 toss折腾.
137
406000
2000
大家都屏气凝神观望最后一掷
07:13
And if it comes down a head, that's great.
138
408000
2000
如果是正 那么非常好
07:15
You're doneDONE, and you celebrate庆祝.
139
410000
2000
你们完了 而你们可以庆祝了
07:17
If it's a tail尾巴 -- well, rather disappointedly失望, you put the glasses眼镜 away
140
412000
2000
如果这是反--那么有些遗憾 你们要把杯子移开
07:19
and put the champagne香槟酒 back.
141
414000
2000
然后把香槟放回去
07:21
And you keep tossing折腾, to wait for the next下一个 head, to get excited兴奋.
142
416000
3000
接着掷硬币 等着下一个正 然后开始激动
07:25
On this side, there's a different不同 experience经验.
143
420000
2000
而这半边则完全不同
07:27
It's the same相同 for the first two parts部分 of the sequence序列.
144
422000
3000
这个序列中前两步都是相同的
07:30
You're a little bit excited兴奋 with the first head --
145
425000
2000
大家因第一个是正有点兴奋
07:32
you get rather more excited兴奋 with the next下一个 tail尾巴.
146
427000
2000
当第二个是反的时候 变得更加激动
07:34
Then you toss折腾 the coin硬币.
147
429000
2000
然后再掷硬币
07:36
If it's a tail尾巴, you crack裂纹 open打开 the champagne香槟酒.
148
431000
3000
如果是反 你们就可以打开香槟了
07:39
If it's a head you're disappointed失望,
149
434000
2000
如果是正 你们会感到失望
07:41
but you're still a third第三 of the way to your pattern模式 again.
150
436000
3000
但你们仍旧已经完成了这个模式的三分之一
07:44
And that's an informal非正式的 way of presenting呈现 it -- that's why there's a difference区别.
151
439000
4000
这就是一种不大正式的解释--这就是出现不同的原因
07:48
Another另一个 way of thinking思维 about it --
152
443000
2000
另外一种思考的方法就是--
07:50
if we tossed a coin硬币 eight million百万 times,
153
445000
2000
如果我们掷八百万次硬币
07:52
then we'd星期三 expect期望 a million百万 head-tail-heads头 - 尾 - 头
154
447000
2000
我们可能会预计有一百万正反正情形
07:54
and a million百万 head-tail-tails头 - 尾 - 尾 -- but the head-tail-heads头 - 尾 - 头 could occur发生 in clumps团块.
155
449000
7000
和一百万次正反反情形的出现--但正反正的情形可能接连出现
08:01
So if you want to put a million百万 things down amongst其中包括 eight million百万 positions位置
156
456000
2000
所以如果你想在八百万个位置中得到一百万个固定的模式
08:03
and you can have some of them overlapping重叠, the clumps团块 will be further进一步 apart距离.
157
458000
5000
可能会有一些是重叠的 重叠的部分会很长
08:08
It's another另一个 way of getting得到 the intuition直觉.
158
463000
2000
这就是另外一种思考方法
08:10
What's the point I want to make?
159
465000
2000
那么这说明什么问题呢?
08:12
It's a very, very simple简单 example, an easily容易 stated声明 question in probability可能性,
160
467000
4000
这是一个非常简单的例子 一个很简单明了的问题--
08:16
which哪一个 every一切 -- you're in good company公司 -- everybody每个人 gets得到 wrong错误.
161
471000
3000
有很多人跟你们一样--这个问题几乎没有人答对
08:19
This is my little diversion导流 into my real真实 passion, which哪一个 is genetics遗传学.
162
474000
4000
这是一个小小的题外话 我很想讲的 是基因学
08:23
There's a connection连接 between之间 head-tail-heads头 - 尾 - 头 and head-tail-tails头 - 尾 - 尾 in genetics遗传学,
163
478000
3000
在基因学中 正反正和正反反两种情形间存在某种联系
08:26
and it's the following以下.
164
481000
3000
这个联系是这样的
08:29
When you toss折腾 a coin硬币, you get a sequence序列 of heads and tails尾巴.
165
484000
3000
掷硬币的时候 你会得到一个正和反组成的序列
08:32
When you look at DNA脱氧核糖核酸, there's a sequence序列 of not two things -- heads and tails尾巴 --
166
487000
3000
而当观察DNA时 会发现这不是两个元素组成的序列--正反正--
08:35
but four letters -- As, GsGS, Cs and TsTS.
167
490000
3000
而是四个字母--A G C T
08:38
And there are little chemical化学 scissors剪刀, called restriction限制 enzymes
168
493000
3000
有一些小小的化学剪刀 叫做限制性内切酶
08:41
which哪一个 cut DNA脱氧核糖核酸 whenever每当 they see particular特定 patterns模式.
169
496000
2000
当它们遇到特定的情形时 就会剪断DNA
08:43
And they're an enormously巨大 useful有用 tool工具 in modern现代 molecular分子 biology生物学.
170
498000
4000
在现代分子生物学中它们是非常有用的工具
08:48
And instead代替 of asking the question, "How long until直到 I see a head-tail-head头 - 尾 - 头?" --
171
503000
3000
在基因学中 我们不问“什么时候能看到正反正的情形?”
08:51
you can ask, "How big will the chunks be when I use a restriction限制 enzyme
172
506000
3000
你可以问 比如说 “如果用限制性内切酶来剪断任何它遇到的GAAG排列
08:54
which哪一个 cuts削减 whenever每当 it sees看到 G-A-A-GGAAG, for example?
173
509000
4000
剪下来的基因部分会有多大?”
08:58
How long will those chunks be?"
174
513000
2000
那些基因部分会有多长?
09:00
That's a rather trivial不重要的 connection连接 between之间 probability可能性 and genetics遗传学.
175
515000
5000
这是概率和基因之间的一个相当细微的联系
09:05
There's a much deeper更深 connection连接, which哪一个 I don't have time to go into
176
520000
3000
他们之间还有一个更深的联系 这里我没有时间多讲
09:08
and that is that modern现代 genetics遗传学 is a really exciting扣人心弦 area of science科学.
177
523000
3000
那就是 现代基因学是一个很令人激动的科学领域
09:11
And we'll hear some talks会谈 later后来 in the conference会议 specifically特别 about that.
178
526000
4000
以后我们可能会在某些大会的演讲中听到这个部分
09:15
But it turns out that unlocking解锁 the secrets秘密 in the information信息 generated产生 by modern现代
179
530000
4000
但是若把现代实验技术中发现的秘密公开,
09:19
experimental试验 technologies技术, a key part部分 of that has to do with fairly相当 sophisticated复杂的 --
180
534000
5000
关键就是那必须与一些相当复杂的--
09:24
you'll你会 be relieved安心 to know that I do something useful有用 in my day job工作,
181
539000
3000
当听到我的工作是多有用的时候你们会倍感释然
09:27
rather more sophisticated复杂的 than the head-tail-head头 - 尾 - 头 story故事 --
182
542000
2000
比正反正的试验要复杂地多--
09:29
but quite相当 sophisticated复杂的 computer电脑 modelingsmodelings and mathematical数学的 modelingsmodelings
183
544000
4000
但是相当复杂的计算机建模 数学建模
09:33
and modern现代 statistical统计 techniques技术.
184
548000
2000
以及现代统计技术
09:35
And I will give you two little snippets片段 -- two examples例子 --
185
550000
3000
我会举在牛津我们团队正在研究的项目中
09:38
of projects项目 we're involved参与 in in my group in Oxford牛津,
186
553000
3000
的两个小例子
09:41
both of which哪一个 I think are rather exciting扣人心弦.
187
556000
2000
我认为这两个例子都很有趣
09:43
You know about the Human人的 Genome基因组 Project项目.
188
558000
2000
大家都了解人类基因组计划
09:45
That was a project项目 which哪一个 aimed针对 to read one copy复制 of the human人的 genome基因组.
189
560000
4000
那是一个项目 目的在于构建人类基因组遗传图谱
09:51
The natural自然 thing to do after you've doneDONE that --
190
566000
2000
当完成那个项目后 下一步自然是--
09:53
and that's what this project项目, the International国际 HapMap单体型图 Project项目,
191
568000
2000
--就是这个计划 国际人类基因组单体型图计划
09:55
which哪一个 is a collaboration合作 between之间 labs实验室 in five or six different不同 countries国家.
192
570000
5000
目前有五六个不同个国家的实验室在合作研究
10:00
Think of the Human人的 Genome基因组 Project项目 as learning学习 what we've我们已经 got in common共同,
193
575000
4000
把人类基因遗传图谱看做是对我们共同点的了解
10:04
and the HapMap单体型图 Project项目 is trying to understand理解
194
579000
2000
而国际人类基因组单体型图计划就是试着了解
10:06
where there are differences分歧 between之间 different不同 people.
195
581000
2000
人类之间的不同
10:08
Why do we care关心 about that?
196
583000
2000
为什么要这么关注这些呢?
10:10
Well, there are lots of reasons原因.
197
585000
2000
这有很多原因
10:12
The most pressing紧迫 one is that we want to understand理解 how some differences分歧
198
587000
4000
最紧迫的一个就是 我们想了解其中一些不同
10:16
make some people susceptible易感 to one disease疾病 -- type-类型-2 diabetes糖尿病, for example --
199
591000
4000
是怎样让一些人容易患一种病的--比如说 二型糖尿病--
10:20
and other differences分歧 make people more susceptible易感 to heart disease疾病,
200
595000
5000
而另一些不同使人更容易得心脏病
10:25
or stroke行程, or autism自闭症 and so on.
201
600000
2000
或中风 自闭症等等其它病症
10:27
That's one big project项目.
202
602000
2000
这是一个宏大的项目
10:29
There's a second第二 big project项目,
203
604000
2000
最近 英国威康信托基金会资助了一个项目
10:31
recently最近 funded资助 by the Wellcome惠康 Trust相信 in this country国家,
204
606000
2000
其规模仅次于上一个项目
10:33
involving涉及 very large studies学习 --
205
608000
2000
它包括了很多大型的研究--
10:35
thousands数千 of individuals个人, with each of eight different不同 diseases疾病,
206
610000
3000
成千上万的人各负责八种不同的疾病
10:38
common共同 diseases疾病 like type-类型-1 and type-类型-2 diabetes糖尿病, and coronary冠状动脉 heart disease疾病,
207
613000
4000
有一些比较常见的疾病 比如一型糖尿病 二型糖尿病和冠心病
10:42
bipolar双极 disease疾病 and so on -- to try and understand理解 the genetics遗传学.
208
617000
4000
躁狂抑郁症等等--来试着了解基因
10:46
To try and understand理解 what it is about genetic遗传 differences分歧 that causes原因 the diseases疾病.
209
621000
3000
着这了解那些导致疾病的基因的不同之处
10:49
Why do we want to do that?
210
624000
2000
为什么我们想做这些呢?
10:51
Because we understand理解 very little about most human人的 diseases疾病.
211
626000
3000
因为我们对大多数人类疾病都了解甚微
10:54
We don't know what causes原因 them.
212
629000
2000
我们不知道病因是什么
10:56
And if we can get in at the bottom底部 and understand理解 the genetics遗传学,
213
631000
2000
如果我们从根本入手并了解基因
10:58
we'll have a window窗口 on the way the disease疾病 works作品,
214
633000
3000
这边开启了一个通向疾病病理的窗口
11:01
and a whole整个 new way about thinking思维 about disease疾病 therapies治疗
215
636000
2000
也开辟了思考疾病治疗方法
11:03
and preventative预防 treatment治疗 and so on.
216
638000
3000
和预防措施的新路径
11:06
So that's, as I said, the little diversion导流 on my main主要 love.
217
641000
3000
所以 就像我之前说过的那样 这是我主要兴趣的一个小分支
11:09
Back to some of the more mundane平凡 issues问题 of thinking思维 about uncertainty不确定.
218
644000
5000
回到一些关于随机性的平凡的问题上来
11:14
Here's这里的 another另一个 quiz测验 for you --
219
649000
2000
这是给你们的另一个测试--
11:16
now suppose假设 we've我们已经 got a test测试 for a disease疾病
220
651000
2000
现在假设我们拿到了一个疾病的检测
11:18
which哪一个 isn't infallible万无一失, but it's pretty漂亮 good.
221
653000
2000
这个检测并不是完全准确的 但准确性很高
11:20
It gets得到 it right 99 percent百分 of the time.
222
655000
3000
这个检测的准确性高达99%
11:23
And I take one of you, or I take someone有人 off the street,
223
658000
3000
现在我让你们中的一个人 或从街上拉来一个人
11:26
and I test测试 them for the disease疾病 in question.
224
661000
2000
然后检测他患病的几率
11:28
Let's suppose假设 there's a test测试 for HIVHIV -- the virus病毒 that causes原因 AIDS艾滋病 --
225
663000
4000
假设这是一个艾滋病毒的测试--一个导致艾滋病的病毒--
11:32
and the test测试 says the person has the disease疾病.
226
667000
3000
而测试表明这个人患病
11:35
What's the chance机会 that they do?
227
670000
3000
那么他患病的几率是多少呢
11:38
The test测试 gets得到 it right 99 percent百分 of the time.
228
673000
2000
这个测试准确性是99%
11:40
So a natural自然 answer回答 is 99 percent百分.
229
675000
4000
所以自然而然会得出99%这个答案
11:44
Who likes喜欢 that answer回答?
230
679000
2000
谁喜欢这个答案?
11:46
Come on -- everyone's大家的 got to get involved参与.
231
681000
1000
别这样--每个人都参与进来
11:47
Don't think you don't trust相信 me anymore.
232
682000
2000
不要觉得你不再相信我了
11:49
(Laughter笑声)
233
684000
1000
(笑)
11:50
Well, you're right to be a bit skeptical怀疑的, because that's not the answer回答.
234
685000
3000
不过 你们的怀疑是正确的 因为这不是正确答案
11:53
That's what you might威力 think.
235
688000
2000
你们可能是这么想的
11:55
It's not the answer回答, and it's not because it's only part部分 of the story故事.
236
690000
3000
这不是正确答案 并不是因为这只是故事的一部分
11:58
It actually其实 depends依靠 on how common共同 or how rare罕见 the disease疾病 is.
237
693000
3000
而实际上它取决于这种病是常见的还是罕见的
12:01
So let me try and illustrate说明 that.
238
696000
2000
现在我来试着说明一下
12:03
Here's这里的 a little caricature漫画 of a million百万 individuals个人.
239
698000
4000
这个图代表一百万人
12:07
So let's think about a disease疾病 that affects影响 --
240
702000
3000
我们来考虑一种疾病的感染率--
12:10
it's pretty漂亮 rare罕见, it affects影响 one person in 10,000.
241
705000
2000
它非常罕见 在一万人中仅一人患病
12:12
Amongst当中 these million百万 individuals个人, most of them are healthy健康
242
707000
3000
在这一百万人中 大部分人都是健康的
12:15
and some of them will have the disease疾病.
243
710000
2000
而一些人会患病
12:17
And in fact事实, if this is the prevalence流行 of the disease疾病,
244
712000
3000
实际上 如果这是疾病的流行程度
12:20
about 100 will have the disease疾病 and the rest休息 won't惯于.
245
715000
3000
那么约一百人会患病而其余人不会
12:23
So now suppose假设 we test测试 them all.
246
718000
2000
现在假设我们给所有人做了测试
12:25
What happens发生?
247
720000
2000
会出现什么情况呢
12:27
Well, amongst其中包括 the 100 who do have the disease疾病,
248
722000
2000
在100个患有该疾病的人中
12:29
the test测试 will get it right 99 percent百分 of the time, and 99 will test测试 positive.
249
724000
5000
这个测试会有99%的正确性 所以99个人会检测出患病
12:34
Amongst当中 all these other people who don't have the disease疾病,
250
729000
2000
在那些没有患病的人中
12:36
the test测试 will get it right 99 percent百分 of the time.
251
731000
3000
这个测试仍然有99%的正确率
12:39
It'll它会 only get it wrong错误 one percent百分 of the time.
252
734000
2000
只有1%是错误的
12:41
But there are so many许多 of them that there'll有会 be an enormous巨大 number of false positives阳性.
253
736000
4000
但是没有患病的人太多了 所以错误的患病检测会非常多
12:45
Put that another另一个 way --
254
740000
2000
换种方法说--
12:47
of all of them who test测试 positive -- so here they are, the individuals个人 involved参与 --
255
742000
5000
在所有结果是患病的检测中--就是这些人--
12:52
less than one in 100 actually其实 have the disease疾病.
256
747000
5000
真正患病的几率小于1%
12:57
So even though虽然 we think the test测试 is accurate准确, the important重要 part部分 of the story故事 is
257
752000
4000
所以即便我们认为这个测试是准确的 这个例子重要的部分在于
13:01
there's another另一个 bit of information信息 we need.
258
756000
3000
我们还需要一些信息
13:04
Here's这里的 the key intuition直觉.
259
759000
2000
这就是关键
13:07
What we have to do, once一旦 we know the test测试 is positive,
260
762000
3000
当知道测试结果为患病时 我们要做的就是
13:10
is to weigh称重 up the plausibility合理性, or the likelihood可能性, of two competing竞争 explanations说明.
261
765000
6000
权衡下面两种解释的概率或可能性
13:16
Each of those explanations说明 has a likely容易 bit and an unlikely不会 bit.
262
771000
3000
每种解释都有一定的可能性
13:19
One explanation说明 is that the person doesn't have the disease疾病 --
263
774000
3000
一种解释是这个人不患病--
13:22
that's overwhelmingly压倒性 likely容易, if you pick someone有人 at random随机 --
264
777000
3000
这种可能性比较大 如果你随机选人的话--
13:25
but the test测试 gets得到 it wrong错误, which哪一个 is unlikely不会.
265
780000
3000
但是测试结果错了 这种情况很罕见
13:29
The other explanation说明 is that the person does have the disease疾病 -- that's unlikely不会 --
266
784000
3000
另一种解释就是这个人不患病--这很少见--
13:32
but the test测试 gets得到 it right, which哪一个 is likely容易.
267
787000
3000
但测试结果正确 这可能性很大
13:35
And the number we end结束 up with --
268
790000
2000
而我们最后得到的数字--
13:37
that number which哪一个 is a little bit less than one in 100 --
269
792000
3000
就是略少于100的数字--
13:40
is to do with how likely容易 one of those explanations说明 is relative相对的 to the other.
270
795000
6000
与这几种解释之间的关联性有关
13:46
Each of them taken采取 together一起 is unlikely不会.
271
801000
2000
每个解释合起来都不大可能
13:49
Here's这里的 a more topical局部的 example of exactly究竟 the same相同 thing.
272
804000
3000
这是另一个说明同样道理的例子 更加切题
13:52
Those of you in Britain英国 will know about what's become成为 rather a celebrated著名 case案件
273
807000
4000
在英国的听众知道 这是一个很有名的案子
13:56
of a woman女人 called Sally出击 Clark克拉克, who had two babies婴儿 who died死亡 suddenly突然.
274
811000
5000
一个女人叫做萨里•克拉克 她有两个孩子 都突然去世
14:01
And initially原来, it was thought that they died死亡 of what's known已知 informally非正式地 as "cot婴儿床 death死亡,"
275
816000
4000
很自然人们以为这属于婴儿猝死
14:05
and more formally正式地 as "Sudden突然 Infant婴儿 Death死亡 Syndrome综合征."
276
820000
3000
更正式的说法是婴儿猝死综合征
14:08
For various各个 reasons原因, she was later后来 charged带电 with murder谋杀.
277
823000
2000
由于多种原因 萨里后来以谋杀罪被逮捕
14:10
And at the trial审讯, her trial审讯, a very distinguished杰出的 pediatrician儿科医师 gave evidence证据
278
825000
4000
在法庭上 一个非常著名的小儿科医师作证
14:14
that the chance机会 of two cot婴儿床 deaths死亡, innocent无辜 deaths死亡, in a family家庭 like hers她的 --
279
829000
5000
两个婴儿猝死 在一个像萨里的家里--
14:19
which哪一个 was professional专业的 and non-smoking禁烟 -- was one in 73 million百万.
280
834000
6000
有经验并不吸烟的--概率为七千三百万分之一
14:26
To cut a long story故事 short, she was convicted被定罪 at the time.
281
841000
3000
长话短说 她最后被判有罪
14:29
Later后来, and fairly相当 recently最近, acquitted无罪释放 on appeal上诉 -- in fact事实, on the second第二 appeal上诉.
282
844000
5000
后来 最近 她在上诉中无罪释放了
14:34
And just to set it in context上下文, you can imagine想像 how awful可怕 it is for someone有人
283
849000
4000
当置于实际情境中 大家就能想象 一个人失去了一个孩子
14:38
to have lost丢失 one child儿童, and then two, if they're innocent无辜,
284
853000
3000
然后又失去了另一个 然后又被诬为凶手
14:41
to be convicted被定罪 of murdering谋杀 them.
285
856000
2000
这是多么可怕的事情
14:43
To be put through通过 the stress强调 of the trial审讯, convicted被定罪 of murdering谋杀 them --
286
858000
2000
要被迫承受审判的压力 并判有罪--
14:45
and to spend time in a women's女士的 prison监狱, where all the other prisoners囚犯
287
860000
3000
在女监里熬过一段日子 那里所有的囚犯
14:48
think you killed杀害 your children孩子 -- is a really awful可怕 thing to happen发生 to someone有人.
288
863000
5000
都认为是你杀了孩子--这件事发生在一个人身上真是太可怕了
14:53
And it happened发生 in large part部分 here because the expert专家 got the statistics统计
289
868000
5000
而这些事的发生 很大程度上是因为那个专家
14:58
horribly可怕 wrong错误, in two different不同 ways方法.
290
873000
3000
得出的数据是错误的 错误出在两方面
15:01
So where did he get the one in 73 million百万 number?
291
876000
4000
那么他是怎样得出七千三百万分之一这个数字的呢
15:05
He looked看着 at some research研究, which哪一个 said the chance机会 of one cot婴儿床 death死亡 in a family家庭
292
880000
3000
他看了一些研究 那些研究上说一个家庭里一个婴儿猝死的概率
15:08
like Sally出击 Clark's克拉克的 is about one in 8,500.
293
883000
5000
就像萨里•克拉克家 这概率是八千五百分之一
15:13
So he said, "I'll assume承担 that if you have one cot婴儿床 death死亡 in a family家庭,
294
888000
4000
所以他说:“我假设如果一个家庭中出现了一个婴儿猝死
15:17
the chance机会 of a second第二 child儿童 dying垂死 from cot婴儿床 death死亡 aren't changed."
295
892000
4000
那么第二个婴儿发生猝死的概率也不会变。”
15:21
So that's what statisticians统计学家 would call an assumption假设 of independence独立.
296
896000
3000
这被统计学家们称为独立事件
15:24
It's like saying, "If you toss折腾 a coin硬币 and get a head the first time,
297
899000
2000
这就像是在说:“如果你掷硬币第一次是正
15:26
that won't惯于 affect影响 the chance机会 of getting得到 a head the second第二 time."
298
901000
3000
这并不会影响第二次投掷得到正的概率。”
15:29
So if you toss折腾 a coin硬币 twice两次, the chance机会 of getting得到 a head twice两次 are a half --
299
904000
5000
所以如果你扔两次硬币 第一次正的几率是二分之一
15:34
that's the chance机会 the first time -- times a half -- the chance机会 a second第二 time.
300
909000
3000
第二次正的几率也是二分之一
15:37
So he said, "Here,
301
912000
2000
所以他说:“我们来假设
15:39
I'll assume承担 that these events事件 are independent独立.
302
914000
4000
假设这些事件是独立的
15:43
When you multiply 8,500 together一起 twice两次,
303
918000
2000
当你将八千五百分之一相乘
15:45
you get about 73 million百万."
304
920000
2000
你就会得到七千三百分之一
15:47
And none没有 of this was stated声明 to the court法庭 as an assumption假设
305
922000
2000
而上面这些并没有在法庭上向陪审团
15:49
or presented呈现 to the jury陪审团 that way.
306
924000
2000
展示作为前提
15:52
Unfortunately不幸 here -- and, really, regrettably很遗憾 --
307
927000
3000
不幸的是--确实很令人遗憾--
15:55
first of all, in a situation情况 like this you'd have to verify校验 it empirically经验.
308
930000
4000
首先 在这种情况下要先以经验判断
15:59
And secondly其次, it's palpably显然 false.
309
934000
2000
第二 这可能是错的
16:02
There are lots and lots of things that we don't know about sudden突然 infant婴儿 deaths死亡.
310
937000
5000
我们对婴儿猝死综合症有太多不了解
16:07
It might威力 well be that there are environmental环境的 factors因素 that we're not aware知道的 of,
311
942000
3000
很可能有一些我们并不知道的环境因素
16:10
and it's pretty漂亮 likely容易 to be the case案件 that there are
312
945000
2000
也很可能是有一些
16:12
genetic遗传 factors因素 we're not aware知道的 of.
313
947000
2000
我们并不了解的基因因素
16:14
So if a family家庭 suffers患有 from one cot婴儿床 death死亡, you'd put them in a high-risk高风险 group.
314
949000
3000
所以如果一个家庭出现一个婴儿猝死 你就要把他们放到高概率组
16:17
They've他们已经 probably大概 got these environmental环境的 risk风险 factors因素
315
952000
2000
他们很可能有这些环境因素
16:19
and/or genetic遗传 risk风险 factors因素 we don't know about.
316
954000
3000
和/或基因因素 而我们对这些并不知情
16:22
And to argue争论, then, that the chance机会 of a second第二 death死亡 is as if you didn't know
317
957000
3000
而就像不知道上面得出的信息一样 确定第二个死亡的概率
16:25
that information信息 is really silly愚蠢.
318
960000
3000
是非常愚蠢的
16:28
It's worse更差 than silly愚蠢 -- it's really bad science科学.
319
963000
4000
这比愚蠢还糟--这是坏科学
16:32
Nonetheless尽管如此, that's how it was presented呈现, and at trial审讯 nobody没有人 even argued争论 it.
320
967000
5000
但是 这推论就这样呈现在法庭上 而几乎没有人质疑
16:37
That's the first problem问题.
321
972000
2000
这是第一个问题
16:39
The second第二 problem问题 is, what does the number of one in 73 million百万 mean?
322
974000
4000
第二个问题是 七千三百万分之一这个数字意味着什么
16:43
So after Sally出击 Clark克拉克 was convicted被定罪 --
323
978000
2000
在萨里•克拉克被定罪后--
16:45
you can imagine想像, it made制作 rather a splash in the press --
324
980000
4000
可以想象 这在媒体中引起轩然大波--
16:49
one of the journalists记者 from one of Britain's英国的 more reputable信誉 newspapers报纸 wrote that
325
984000
7000
一个英国相当有名望的报社记者写到
16:56
what the expert专家 had said was,
326
991000
2000
这个专家说
16:58
"The chance机会 that she was innocent无辜 was one in 73 million百万."
327
993000
5000
“她无罪的几率是七千三百万分之一”
17:03
Now, that's a logical合乎逻辑 error错误.
328
998000
2000
这是一个逻辑上的错误
17:05
It's exactly究竟 the same相同 logical合乎逻辑 error错误 as the logical合乎逻辑 error错误 of thinking思维 that
329
1000000
3000
这个错误相当于认为
17:08
after the disease疾病 test测试, which哪一个 is 99 percent百分 accurate准确,
330
1003000
2000
在准确率99%的疾病测试后
17:10
the chance机会 of having the disease疾病 is 99 percent百分.
331
1005000
4000
患病的几率是99%
17:14
In the disease疾病 example, we had to bear in mind心神 two things,
332
1009000
4000
在疾病的例子中 我们要注意两点
17:18
one of which哪一个 was the possibility可能性 that the test测试 got it right or not.
333
1013000
4000
一个是这个测试得出的可能性是否正确
17:22
And the other one was the chance机会, a priori先验, that the person had the disease疾病 or not.
334
1017000
4000
另一个就是这个人本身是否患病
17:26
It's exactly究竟 the same相同 in this context上下文.
335
1021000
3000
这个情形是完全相同的
17:29
There are two things involved参与 -- two parts部分 to the explanation说明.
336
1024000
4000
这个解释包括两个部分
17:33
We want to know how likely容易, or relatively相对 how likely容易, two different不同 explanations说明 are.
337
1028000
4000
我们想知道这两种不同解释发生的可能性 或相对的可能性
17:37
One of them is that Sally出击 Clark克拉克 was innocent无辜 --
338
1032000
3000
一个是 萨里•克拉克是清白的--
17:40
which哪一个 is, a priori先验, overwhelmingly压倒性 likely容易 --
339
1035000
2000
也就是 一个先验 极为可能--
17:42
most mothers母亲 don't kill their children孩子.
340
1037000
3000
大多母亲不会杀自己的孩子
17:45
And the second第二 part部分 of the explanation说明
341
1040000
2000
这个解释的第二部分
17:47
is that she suffered遭遇 an incredibly令人难以置信 unlikely不会 event事件.
342
1042000
3000
就是她遭遇了一个可能性极小的时间
17:50
Not as unlikely不会 as one in 73 million百万, but nonetheless尽管如此, rather unlikely不会.
343
1045000
4000
不像七千三百万分之一那样小 但也同样不可能
17:54
The other explanation说明 is that she was guilty有罪.
344
1049000
2000
另一个解释就是
17:56
Now, we probably大概 think a priori先验 that's unlikely不会.
345
1051000
2000
我们可能认为一个先验是 不大可能
17:58
And we certainly当然 should think in the context上下文 of a criminal刑事 trial审讯
346
1053000
3000
然后我们当然应该认为在刑事审判的情形下
18:01
that that's unlikely不会, because of the presumption假定 of innocence无辜.
347
1056000
3000
这是不大可能的 因为我们以无罪为前提
18:04
And then if she were trying to kill the children孩子, she succeeded成功.
348
1059000
4000
如果她那时试着杀害孩子 那么她成功了
18:08
So the chance机会 that she's innocent无辜 isn't one in 73 million百万.
349
1063000
4000
所以她无罪的机率并不是七千三百万分之一
18:12
We don't know what it is.
350
1067000
2000
我们不知道这个个机率是多少
18:14
It has to do with weighing称重 up the strength强度 of the other evidence证据 against反对 her
351
1069000
4000
这同衡量其它对她不利的证据
18:18
and the statistical统计 evidence证据.
352
1073000
2000
和数据型证据有关
18:20
We know the children孩子 died死亡.
353
1075000
2000
我们知道 孩子死了
18:22
What matters事项 is how likely容易 or unlikely不会, relative相对的 to each other,
354
1077000
4000
重要的是这两种解释
18:26
the two explanations说明 are.
355
1081000
2000
相对发生的机率
18:28
And they're both implausible难以置信.
356
1083000
2000
他们都令人难以置信
18:31
There's a situation情况 where errors错误 in statistics统计 had really profound深刻
357
1086000
4000
在这种情形下 错误的数据
18:35
and really unfortunate不幸的 consequences后果.
358
1090000
3000
产生了很重大而且不幸的结果
18:38
In fact事实, there are two other women妇女 who were convicted被定罪 on the basis基础 of the
359
1093000
2000
事实上 还有其他两个女人因这个小儿科医师的作证
18:40
evidence证据 of this pediatrician儿科医师, who have subsequently后来 been released发布 on appeal上诉.
360
1095000
4000
而被定罪 而她们在上诉中都被无罪释放了
18:44
Many许多 cases were reviewed回顾.
361
1099000
2000
很多案子都因此而重审
18:46
And it's particularly尤其 topical局部的 because he's currently目前 facing面对 a disrepute蒙羞 charge收费
362
1101000
4000
这引起了很高的关注 因为他正面临着
18:50
at Britain's英国的 General一般 Medical Council评议会.
363
1105000
3000
英国综合医学委员会的名誉调查
18:53
So just to conclude得出结论 -- what are the take-home带回家 messages消息 from this?
364
1108000
4000
总结一下 我们应该得到什么警示呢
18:57
Well, we know that randomness随机性 and uncertainty不确定 and chance机会
365
1112000
4000
我们知道 随机性、不确定性和概率
19:01
are very much a part部分 of our everyday每天 life.
366
1116000
3000
在生活中影响重大
19:04
It's also true真正 -- and, although虽然, you, as a collective集体, are very special特别 in many许多 ways方法,
367
1119000
5000
并且大家作为一个集体 在很多方面都很特别
19:09
you're completely全然 typical典型 in not getting得到 the examples例子 I gave right.
368
1124000
4000
大家没有回答正确我给出的例子 是完全正常并具有代表性的
19:13
It's very well documented记录 that people get things wrong错误.
369
1128000
3000
有很多人们理解错误的记录
19:16
They make errors错误 of logic逻辑 in reasoning推理 with uncertainty不确定.
370
1131000
3000
他们在不确定性方面犯逻辑错误
19:20
We can cope应付 with the subtleties细微之处 of language语言 brilliantly出色 --
371
1135000
2000
我们可以很好地解决语言的细微差别
19:22
and there are interesting有趣 evolutionary发展的 questions问题 about how we got here.
372
1137000
3000
还有有趣的进化方面的问题 如我们是怎么来到这里的
19:25
We are not good at reasoning推理 with uncertainty不确定.
373
1140000
3000
我们并不擅长不确定性
19:28
That's an issue问题 in our everyday每天 lives生活.
374
1143000
2000
这是我们生活中的一个问题
19:30
As you've heard听说 from many许多 of the talks会谈, statistics统计 underpins巩固 an enormous巨大 amount
375
1145000
3000
像你们听过的很多演讲 数据是很多科学研究中
19:33
of research研究 in science科学 -- in social社会 science科学, in medicine医学
376
1148000
3000
的基础--社会科学 医学
19:36
and indeed确实, quite相当 a lot of industry行业.
377
1151000
2000
确实 很多行业
19:38
All of quality质量 control控制, which哪一个 has had a major重大的 impact碰撞 on industrial产业 processing处理,
378
1153000
4000
所有的质量控制 这些对工业过程的影响极其重要
19:42
is underpinned支撑 by statistics统计.
379
1157000
2000
这些都以数据为基础
19:44
It's something we're bad at doing.
380
1159000
2000
而这方面我们并不擅长
19:46
At the very least最小, we should recognize认识 that, and we tend趋向 not to.
381
1161000
3000
至少我们应该意识到这一点 并尽力防止错误发生
19:49
To go back to the legal法律 context上下文, at the Sally出击 Clark克拉克 trial审讯
382
1164000
4000
回到法律方面 在萨里•克拉克的案子中
19:53
all of the lawyers律师 just accepted公认 what the expert专家 said.
383
1168000
4000
所有律师都接受了专家的证词
19:57
So if a pediatrician儿科医师 had come out and said to a jury陪审团,
384
1172000
2000
如果一个小儿科医师出来对陪审团作证
19:59
"I know how to build建立 bridges桥梁. I've built内置 one down the road.
385
1174000
3000
我不知道怎样建造桥梁 我在路那边建了一个
20:02
Please drive驾驶 your car汽车 home over it,"
386
1177000
2000
开车回家的时候请放心过桥
20:04
they would have said, "Well, pediatricians儿科医生 don't know how to build建立 bridges桥梁.
387
1179000
2000
他们会说 小儿科医师不懂怎样建造桥梁
20:06
That's what engineers工程师 do."
388
1181000
2000
那是工程师的工作
20:08
On the other hand, he came来了 out and effectively有效 said, or implied默示,
389
1183000
3000
而另一方面 他们站出来说 或暗示
20:11
"I know how to reason原因 with uncertainty不确定. I know how to do statistics统计."
390
1186000
3000
我知道怎样运用不确定性 我知道怎样处理数据
20:14
And everyone大家 said, "Well, that's fine. He's an expert专家."
391
1189000
3000
然后大家都说 这没问题 他是专家
20:17
So we need to understand理解 where our competence权限 is and isn't.
392
1192000
3000
所以我们应该明白我们的什么是我们的强项 什么不是
20:20
Exactly究竟 the same相同 kinds of issues问题 arose出现 in the early days of DNA脱氧核糖核酸 profiling剖析,
393
1195000
4000
完全相同类型的问题每天都出现在DNA的测绘中
20:24
when scientists科学家们, and lawyers律师 and in some cases judges法官,
394
1199000
4000
科学家 律师 有些情况下甚至法官
20:28
routinely常规 misrepresented歪曲 evidence证据.
395
1203000
3000
都会错误地解释证据
20:32
Usually平时 -- one hopes希望 -- innocently傻傻, but misrepresented歪曲 evidence证据.
396
1207000
3000
通常--大家希望--结果是无罪 只是错误地解释了证据
20:35
Forensic法庭的 scientists科学家们 said, "The chance机会 that this guy's家伙 innocent无辜 is one in three million百万."
397
1210000
5000
法庭上的科学家说 这个人无罪的机率是三百万分之一
20:40
Even if you believe the number, just like the 73 million百万 to one,
398
1215000
2000
即使你相信这个数据 就像七千三百万分之一
20:42
that's not what it meant意味着.
399
1217000
2000
这也并不是它真正的含义
20:44
And there have been celebrated著名 appeal上诉 cases
400
1219000
2000
因为这个在英国和其他地方
20:46
in Britain英国 and elsewhere别处 because of that.
401
1221000
2000
有很多上诉案件
20:48
And just to finish in the context上下文 of the legal法律 system系统.
402
1223000
3000
这就是在法律层面上我们要考虑的问题
20:51
It's all very well to say, "Let's do our best最好 to present当下 the evidence证据."
403
1226000
4000
说“我们尽量给予证据更好的解释”固然很好
20:55
But more and more, in cases of DNA脱氧核糖核酸 profiling剖析 -- this is another另一个 one --
404
1230000
3000
但越来越的地 在DNA测绘中--这也很重要--
20:58
we expect期望 juries陪审团, who are ordinary普通 people --
405
1233000
3000
我们希望陪审团 那些普通人--
21:01
and it's documented记录 they're very bad at this --
406
1236000
2000
记录表明他们非常不擅此类--
21:03
we expect期望 juries陪审团 to be able能够 to cope应付 with the sorts排序 of reasoning推理 that goes on.
407
1238000
4000
我们希望陪审团能够处理好这些推理
21:07
In other spheres of life, if people argued争论 -- well, except possibly或者 for politics政治 --
408
1242000
5000
在生活的其它方面 如果人们在争辩的时候--当然 也许不包括政治
21:12
but in other spheres of life, if people argued争论 illogically不合逻辑,
409
1247000
2000
但是在生活的其他方面 如果人们争辩地并不合逻辑
21:14
we'd星期三 say that's not a good thing.
410
1249000
2000
我们认为这不是好现象
21:16
We sort分类 of expect期望 it of politicians政治家 and don't hope希望 for much more.
411
1251000
4000
在不确定性方面 我们也从某种程度上对政客抱有希望
21:20
In the case案件 of uncertainty不确定, we get it wrong错误 all the time --
412
1255000
3000
但并不奢求什么 我们一直都没对过
21:23
and at the very least最小, we should be aware知道的 of that,
413
1258000
2000
至少 我们应该认识到这一点
21:25
and ideally理想, we might威力 try and do something about it.
414
1260000
2000
并且 希望我们能试着做什么去改变这一点
21:27
Thanks谢谢 very much.
415
1262000
1000
谢谢大家
Translated by Xiaofei Zhang
Reviewed by Zhu Jie

▲Back to top

ABOUT THE SPEAKER
Peter Donnelly - Mathematician; statistician
Peter Donnelly is an expert in probability theory who applies statistical methods to genetic data -- spurring advances in disease treatment and insight on our evolution. He's also an expert on DNA analysis, and an advocate for sensible statistical analysis in the courtroom.

Why you should listen

Peter Donnelly applies statistical methods to real-world problems, ranging from DNA analysis (for criminal trials), to the treatment of genetic disorders. A mathematician who collaborates with biologists, he specializes in applying probability and statistics to the field of genetics, in hopes of shedding light on evolutionary history and the structure of the human genome.

The Australian-born, Oxford-based mathematician is best known for his work in molecular evolution (tracing the roots of human existence to their earliest origins using the mutation rates of mitochondrial DNA). He studies genetic distributions in living populations to trace human evolutionary history -- an approach that informs research in evolutionary biology, as well as medical treatment for genetic disorders. Donnelly is a key player in the International HapMap Project, an ongoing international effort to model human genetic variation and pinpoint the genes responsible for specific aspects of health and disease; its implications for disease prevention and treatment are vast.

He's also a leading expert on DNA analysis and the use of forensic science in criminal trials; he's an outspoken advocate for bringing sensible statistical analysis into the courtroom. Donnelly leads Oxford University's Mathematical Genetics Group, which conducts research in genetic modeling, human evolutionary history, and forensic DNA profiling. He is also serves as Director of the Wellcome Trust Centre for Human Genetics at Oxford University, which explores the genetic relationships to disease and illness. 

More profile about the speaker
Peter Donnelly | Speaker | TED.com