ABOUT THE SPEAKER
Peter Donnelly - Mathematician; statistician
Peter Donnelly is an expert in probability theory who applies statistical methods to genetic data -- spurring advances in disease treatment and insight on our evolution. He's also an expert on DNA analysis, and an advocate for sensible statistical analysis in the courtroom.

Why you should listen

Peter Donnelly applies statistical methods to real-world problems, ranging from DNA analysis (for criminal trials), to the treatment of genetic disorders. A mathematician who collaborates with biologists, he specializes in applying probability and statistics to the field of genetics, in hopes of shedding light on evolutionary history and the structure of the human genome.

The Australian-born, Oxford-based mathematician is best known for his work in molecular evolution (tracing the roots of human existence to their earliest origins using the mutation rates of mitochondrial DNA). He studies genetic distributions in living populations to trace human evolutionary history -- an approach that informs research in evolutionary biology, as well as medical treatment for genetic disorders. Donnelly is a key player in the International HapMap Project, an ongoing international effort to model human genetic variation and pinpoint the genes responsible for specific aspects of health and disease; its implications for disease prevention and treatment are vast.

He's also a leading expert on DNA analysis and the use of forensic science in criminal trials; he's an outspoken advocate for bringing sensible statistical analysis into the courtroom. Donnelly leads Oxford University's Mathematical Genetics Group, which conducts research in genetic modeling, human evolutionary history, and forensic DNA profiling. He is also serves as Director of the Wellcome Trust Centre for Human Genetics at Oxford University, which explores the genetic relationships to disease and illness. 

More profile about the speaker
Peter Donnelly | Speaker | TED.com
TEDGlobal 2005

Peter Donnelly: How juries are fooled by statistics

Peter Donnelly:被統計愚弄的陪審團

Filmed:
1,279,860 views

牛津數學家Peter Donnelly揭露一般大眾在解讀統計資訊時所會犯的錯誤,這些錯誤在罪犯的審判上,會造成致命的影響。
- Mathematician; statistician
Peter Donnelly is an expert in probability theory who applies statistical methods to genetic data -- spurring advances in disease treatment and insight on our evolution. He's also an expert on DNA analysis, and an advocate for sensible statistical analysis in the courtroom. Full bio

Double-click the English transcript below to play the video.

00:25
As other speakers音箱 have said, it's a rather daunting艱鉅 experience經驗 --
0
0
2000
我像其他講者一樣,覺得在各位面前演講,
00:27
a particularly尤其 daunting艱鉅 experience經驗 -- to be speaking請講 in front面前 of this audience聽眾.
1
2000
3000
是一件很令人害怕的事。
00:30
But unlike不像 the other speakers音箱, I'm not going to tell you about
2
5000
3000
但我不像其他演講者,我不會講述有關宇宙的奧妙,
00:33
the mysteries奧秘 of the universe宇宙, or the wonders奇蹟 of evolution演化,
3
8000
2000
或是講述演化的神奇之處,
00:35
or the really clever聰明, innovative創新 ways方法 people are attacking進攻
4
10000
4000
我也不會講述那些人們用來對抗世上不公不義
00:39
the major重大的 inequalities不平等 in our world世界.
5
14000
2000
所採行的創新招術,
00:41
Or even the challenges挑戰 of nation-states民族國家 in the modern現代 global全球 economy經濟.
6
16000
5000
甚至那些現代國家所需要面對的全球經濟問題,
00:46
My brief簡要, as you've just heard聽說, is to tell you about statistics統計 --
7
21000
4000
我會講的就是剛才主持人所提到的:統計學,
00:50
and, to be more precise精確, to tell you some exciting扣人心弦 things about statistics統計.
8
25000
3000
正確地說,我會告訴各位統計學有趣之處,
00:53
And that's --
9
28000
1000
那就是...
00:54
(Laughter笑聲)
10
29000
1000
(笑聲)
00:55
-- that's rather more challenging具有挑戰性的
11
30000
2000
這項挑戰可不亞於在我之前
00:57
than all the speakers音箱 before me and all the ones那些 coming未來 after me.
12
32000
2000
或在我之後出現的講者啊!
00:59
(Laughter笑聲)
13
34000
1000
(笑聲)
01:01
One of my senior前輩 colleagues同事 told me, when I was a youngster青少年 in this profession職業,
14
36000
5000
有一位前輩在我剛加入這一行時很驕傲地告訴我,
01:06
rather proudly傲然, that statisticians統計學家 were people who liked喜歡 figures人物
15
41000
4000
他說,統計學家是一群很喜歡數字的人,
01:10
but didn't have the personality個性 skills技能 to become成為 accountants會計師.
16
45000
3000
但卻不具備得以使他們成為會計師的人際關係技巧。
01:13
(Laughter笑聲)
17
48000
2000
(笑聲)
01:15
And there's another另一個 in-joke在玩笑 among其中 statisticians統計學家, and that's,
18
50000
3000
還有另一個關於統計學家的笑話:
01:18
"How do you tell the introverted內斂 statistician統計員 from the extroverted外放 statistician統計員?"
19
53000
3000
「要怎麼分辨一個統計學家的個性是內向還是外向?」
01:21
To which哪一個 the answer回答 is,
20
56000
2000
答案是:
01:23
"The extroverted外放 statistician's統計學家的 the one who looks容貌 at the other person's人的 shoes."
21
58000
5000
「外向的統計學家會盯著別人的鞋子看。」
01:28
(Laughter笑聲)
22
63000
3000
(笑聲)
01:31
But I want to tell you something useful有用 -- and here it is, so concentrate集中 now.
23
66000
5000
我要告訴各位一些有用的資訊,所以請專心一點。
01:36
This evening晚間, there's a reception招待會 in the University's大學的 Museum博物館 of Natural自然 History歷史.
24
71000
3000
今晚,在學校的自然歷史博物館裡有一場招待會,
01:39
And it's a wonderful精彩 setting設置, as I hope希望 you'll你會 find,
25
74000
2000
我希望各位覺得辦得還不錯,
01:41
and a great icon圖標 to the best最好 of the Victorian維多利亞時代 tradition傳統.
26
76000
5000
主題是維多利亞時期的優良傳統。
01:46
It's very unlikely不會 -- in this special特別 setting設置, and this collection採集 of people --
27
81000
5000
在這場盛會裡,聚集了很多人,
01:51
but you might威力 just find yourself你自己 talking to someone有人 you'd rather wish希望 that you weren't.
28
86000
3000
但你有可能會和一個你根本不想說話的人對談,
01:54
So here's這裡的 what you do.
29
89000
2000
我給各位一點建議,
01:56
When they say to you, "What do you do?" -- you say, "I'm a statistician統計員."
30
91000
4000
當他們問說:「你做哪一行?」,你就回答:「我是個統計學家。」
02:00
(Laughter笑聲)
31
95000
1000
(笑聲)
02:01
Well, except they've他們已經 been pre-warned預先警告 now, and they'll他們會 know you're making製造 it up.
32
96000
4000
除非先前就有人告訴他們這個小伎倆,否則他們不會知道你在說謊。
02:05
And then one of two things will happen發生.
33
100000
2000
接下來就有二種可能,
02:07
They'll他們會 either discover發現 their long-lost久未 cousin表姐 in the other corner of the room房間
34
102000
2000
他們要不是會突然發現久未聯絡的表兄弟
02:09
and run over and talk to them.
35
104000
2000
出現在大廳另一頭而趕去找他說話,
02:11
Or they'll他們會 suddenly突然 become成為 parched and/or hungry飢餓 -- and often經常 both --
36
106000
3000
要不就會突然覺得很渴或很餓,或是又渴又餓,
02:14
and sprint短跑 off for a drink and some food餐飲.
37
109000
2000
不得不趕緊去找些東西來吃吃或喝喝。
02:16
And you'll你會 be left in peace和平 to talk to the person you really want to talk to.
38
111000
4000
這時你就獲得自由了,你可以找你想要說話的人聊天了。
02:20
It's one of the challenges挑戰 in our profession職業 to try and explain說明 what we do.
39
115000
3000
做我們這一行的人,有時很難向別人解釋我們在做什麼,
02:23
We're not top最佳 on people's人們 lists名單 for dinner晚餐 party派對 guests賓客 and conversations對話 and so on.
40
118000
5000
我們也不是別人晚宴賓客或是聊天的首選名單,
02:28
And it's something I've never really found發現 a good way of doing.
41
123000
2000
甚至我自己也覺得很難説明我的工作内容。
02:30
But my wife妻子 -- who was then my girlfriend女朋友 --
42
125000
3000
但我的太太,那時還是我的女友,
02:33
managed管理 it much better than I've ever been able能夠 to.
43
128000
3000
倒是説明得比我還清楚。
02:36
Many許多 years年份 ago, when we first started開始 going out, she was working加工 for the BBCBBC in Britain英國,
44
131000
3000
多年以前,當我們開始約會時,她那時在英國的BBC(英國廣播公司)工作,
02:39
and I was, at that stage階段, working加工 in America美國.
45
134000
2000
而我那時則在美國工作,
02:41
I was coming未來 back to visit訪問 her.
46
136000
2000
有一次我要回來英國跟她見面。
02:43
She told this to one of her colleagues同事, who said, "Well, what does your boyfriend男朋友 do?"
47
138000
6000
她和一個同事有了這樣的對話,對方問:「你男朋友是做什麼的?」
02:49
Sarah莎拉 thought quite相當 hard about the things I'd explained解釋 --
48
144000
2000
於是莎拉把我之前對她解釋的工作內容
02:51
and she concentrated集中, in those days, on listening.
49
146000
4000
再仔細地想了一遍,她在那時候都還很認真地聽我說話。
02:55
(Laughter笑聲)
50
150000
2000
(笑聲)
02:58
Don't tell her I said that.
51
153000
2000
不要告訴她我說過這件事。
03:00
And she was thinking思維 about the work I did developing發展 mathematical數學的 models楷模
52
155000
4000
接著她想到我那時正在為解開演化與現代基因之謎
03:04
for understanding理解 evolution演化 and modern現代 genetics遺傳學.
53
159000
3000
建立一些數學模型,
03:07
So when her colleague同事 said, "What does he do?"
54
162000
3000
所以當她的同事問道:「他是做什麼的?」
03:10
She paused暫停 and said, "He models楷模 things."
55
165000
4000
她停了好一會兒才說:「他是做模型的。」
03:14
(Laughter笑聲)
56
169000
1000
(笑聲)
03:15
Well, her colleague同事 suddenly突然 got much more interested有興趣 than I had any right to expect期望
57
170000
4000
哇!她的同事突然對我所做的事感到高度興趣,
03:19
and went on and said, "What does he model模型?"
58
174000
3000
接著問:「他做什麼模型?」
03:22
Well, Sarah莎拉 thought a little bit more about my work and said, "Genes基因."
59
177000
3000
莎拉想了一會兒,說:「基因。」
03:25
(Laughter笑聲)
60
180000
4000
(笑聲)
03:29
"He models楷模 genes基因."
61
184000
2000
「他為基因建立模型。」
03:31
That is my first love, and that's what I'll tell you a little bit about.
62
186000
4000
莎拉是我的初戀,只能說到這裡了。
03:35
What I want to do more generally通常 is to get you thinking思維 about
63
190000
4000
接下來,我想讓各位想想,我們所處的世界
03:39
the place地點 of uncertainty不確定 and randomness隨機性 and chance機會 in our world世界,
64
194000
3000
是不是充滿了不確定性、各種隨機因素與機會?
03:42
and how we react應對 to that, and how well we do or don't think about it.
65
197000
5000
我們的反應又是如何,有或沒有意識到這件事呢?
03:47
So you've had a pretty漂亮 easy簡單 time up till直到 now --
66
202000
2000
剛剛那幾分鐘各位都聽得很輕鬆,
03:49
a few少數 laughs, and all that kind of thing -- in the talks會談 to date日期.
67
204000
2000
有笑話還有一些別的事情,
03:51
You've got to think, and I'm going to ask you some questions問題.
68
206000
3000
但各位得動動腦,我要問各位幾個問題。
03:54
So here's這裡的 the scene現場 for the first question I'm going to ask you.
69
209000
2000
在我問各位第一個問題之前,
03:56
Can you imagine想像 tossing折騰 a coin硬幣 successively依次?
70
211000
3000
我要先請各位想像一下連續投擲幾次銅板的畫面。
03:59
And for some reason原因 -- which哪一個 shall remain rather vague模糊 --
71
214000
3000
基於一些我們還無法解釋的因素,
04:02
we're interested有興趣 in a particular特定 pattern模式.
72
217000
2000
統計學家對於銅板正反面出現的次序很感興趣,
04:04
Here's這裡的 one -- a head, followed其次 by a tail尾巴, followed其次 by a tail尾巴.
73
219000
3000
例如:先是人頭、再來字、再來一次字。
04:07
So suppose假設 we toss折騰 a coin硬幣 repeatedly反复.
74
222000
3000
假設我們不斷重覆投擲一個銅板,
04:10
Then the pattern模式, head-tail-tail頭 - 尾 - 尾, that we've我們已經 suddenly突然 become成為 fixated迷戀 with happens發生 here.
75
225000
5000
那麼「人頭、字、字」這個順序就是我們關注的重點,
04:15
And you can count計數: one, two, three, four, five, six, seven, eight, nine, 10 --
76
230000
4000
接下來你數:1, 2, 3, 4, 5, 6, 7, 8, 9, 10...
04:19
it happens發生 after the 10th toss折騰.
77
234000
2000
在第10次投擲時才出現。
04:21
So you might威力 think there are more interesting有趣 things to do, but humor幽默 me for the moment時刻.
78
236000
3000
你一定在想這有什麼好玩的?但還是先遷就我一下。
04:24
Imagine想像 this half of the audience聽眾 each get out coins硬幣, and they toss折騰 them
79
239000
4000
想像一下,這一半的聽眾都拿到一個銅板,開始投擲,
04:28
until直到 they first see the pattern模式 head-tail-tail頭 - 尾 - 尾.
80
243000
3000
要一直投到看到「人頭、字、字」這個順序為止。
04:31
The first time they do it, maybe it happens發生 after the 10th toss折騰, as here.
81
246000
2000
第一輪,或許就像我剛才說的,到第十次才看到,
04:33
The second第二 time, maybe it's after the fourth第四 toss折騰.
82
248000
2000
到了第二輪,或許在第四次會看到,
04:35
The next下一個 time, after the 15th toss折騰.
83
250000
2000
第三輪,或許在第15次才看到。
04:37
So you do that lots and lots of times, and you average平均 those numbers數字.
84
252000
3000
就這樣一直重覆做下去,然後把所有數字平均,
04:40
That's what I want this side to think about.
85
255000
3000
這是我要這一半聽眾去想的事情。
04:43
The other half of the audience聽眾 doesn't like head-tail-tail頭 - 尾 - 尾 --
86
258000
2000
另外這一半的聽眾,我就不要你們做「人頭、字、字」了,
04:45
they think, for deep cultural文化 reasons原因, that's boring無聊 --
87
260000
3000
基於深厚的文化因素,你們一定覺得,那種順序太無聊了,
04:48
and they're much more interested有興趣 in a different不同 pattern模式 -- head-tail-head頭 - 尾 - 頭.
88
263000
3000
我們想要有趣一點的順序:「人頭、字、人頭」
04:51
So, on this side, you get out your coins硬幣, and you toss折騰 and toss折騰 and toss折騰.
89
266000
3000
所以,這邊的聽眾,你們拿起了銅板,投了再投
04:54
And you count計數 the number of times until直到 the pattern模式 head-tail-head頭 - 尾 - 頭 appears出現
90
269000
3000
把第一次出現「人頭、字、人頭」這個順序的次數記錄下來,
04:57
and you average平均 them. OK?
91
272000
3000
再算出平均數,好嗎?
05:00
So on this side, you've got a number --
92
275000
2000
這一半的聽眾,你們有一個平均數,
05:02
you've doneDONE it lots of times, so you get it accurately準確 --
93
277000
2000
你們投過很多次,所以一定很準確,
05:04
which哪一個 is the average平均 number of tosses until直到 head-tail-tail頭 - 尾 - 尾.
94
279000
3000
一定可以得出一個第一次出現「人頭、字、字」的平均數。
05:07
On this side, you've got a number -- the average平均 number of tosses until直到 head-tail-head頭 - 尾 - 頭.
95
282000
4000
而這一半聽眾,你們也有一個關於「人頭、字、人頭」的平均數。
05:11
So here's這裡的 a deep mathematical數學的 fact事實 --
96
286000
2000
因此我們可以得出一個深奧的數學理論:
05:13
if you've got two numbers數字, one of three things must必須 be true真正.
97
288000
3000
若你有二個數字,一定會有以下三種情形的其中之一,
05:16
Either they're the same相同, or this one's那些 bigger than this one,
98
291000
3000
要不他們二個相等,要不就是這個數大於另一個數,
05:19
or this one's那些 bigger than that one.
99
294000
1000
要不就是另一個數大於這個數。
05:20
So what's going on here?
100
295000
3000
你們覺得會是哪一種情形?
05:23
So you've all got to think about this, and you've all got to vote投票 --
101
298000
2000
大家得好好想一想,然後我要你們投票,
05:25
and we're not moving移動 on.
102
300000
1000
現在就想一想。
05:26
And I don't want to end結束 up in the two-minute兩分鐘 silence安靜
103
301000
2000
我可不想讓接下來的二分鐘冷場,
05:28
to give you more time to think about it, until直到 everyone's大家的 expressed表達 a view視圖. OK.
104
303000
4000
所以我要你們都好好想一想,每個人都得表達出自己的意見。
05:32
So what you want to do is compare比較 the average平均 number of tosses until直到 we first see
105
307000
4000
我要你們比較一下,第一次出現「人頭、字、人頭」的平均投擲數,
05:36
head-tail-head頭 - 尾 - 頭 with the average平均 number of tosses until直到 we first see head-tail-tail頭 - 尾 - 尾.
106
311000
4000
和第一次出現「人頭、字、字」的平均投擲數孰大孰小。
05:41
Who thinks that A is true真正 --
107
316000
2000
認為A是正確的請舉手?
05:43
that, on average平均, it'll它會 take longer to see head-tail-head頭 - 尾 - 頭 than head-tail-tail頭 - 尾 - 尾?
108
318000
4000
也就是說,平均下來要花較多時間才會看到「人頭、字、人頭」這種順序?
05:47
Who thinks that B is true真正 -- that on average平均, they're the same相同?
109
322000
3000
認為B是正確的請舉手?就是二者平均數相等?
05:51
Who thinks that C is true真正 -- that, on average平均, it'll它會 take less time
110
326000
2000
認為C是正確的請舉手?也就是說,平均下來,
05:53
to see head-tail-head頭 - 尾 - 頭 than head-tail-tail頭 - 尾 - 尾?
111
328000
3000
要花較多時間才會看到「人頭、字、字」這種順序?
05:57
OK, who hasn't有沒有 voted yet然而? Because that's really naughty淘氣 -- I said you had to.
112
332000
3000
還有誰沒投票?你們真的很不乖哦!我說過你們都得投票啊!
06:00
(Laughter笑聲)
113
335000
1000
(笑聲)
06:02
OK. So most people think B is true真正.
114
337000
3000
好,大部分的人都認為B是正確的,
06:05
And you might威力 be relieved安心 to know even rather distinguished傑出的 mathematicians數學家 think that.
115
340000
3000
如果你們知道最傑出的數學家也會這麼想,應該就會釋懷了吧!
06:08
It's not. A is true真正 here.
116
343000
4000
事實上不是,A才是正確的,
06:12
It takes longer, on average平均.
117
347000
2000
平均來說會花比較多時間才會看到「人頭、字、人頭」這種順序。
06:14
In fact事實, the average平均 number of tosses till直到 head-tail-head頭 - 尾 - 頭 is 10
118
349000
2000
「人頭、字、人頭」的平均投擲次數是10次,
06:16
and the average平均 number of tosses until直到 head-tail-tail頭 - 尾 - 尾 is eight.
119
351000
5000
而「人頭、字、字」的平均投擲次數則是8次。
06:21
How could that be?
120
356000
2000
怎麼會這樣?
06:24
Anything different不同 about the two patterns模式?
121
359000
3000
這二種順序有什麼不同?
06:30
There is. Head-tail-head頭尾頭 overlaps重疊 itself本身.
122
365000
5000
的確有所不同,「人頭、字、人頭」的頭尾是重覆的,
06:35
If you went head-tail-head-tail-head頭 - 尾 - 頭 - 尾 - 頭, you can cunningly狡猾 get two occurrences事件
123
370000
4000
所以如果你投出「人頭、字、人頭、字、人頭」,
06:39
of the pattern模式 in only five tosses.
124
374000
3000
在這五次投擲裡你就會看到二次這種順序,
06:42
You can't do that with head-tail-tail頭 - 尾 - 尾.
125
377000
2000
「人頭、字、字」就沒有這種重覆性,
06:44
That turns out to be important重要.
126
379000
2000
這是很重要的一點,
06:46
There are two ways方法 of thinking思維 about this.
127
381000
2000
我們可以從二方面來思考這件事。
06:48
I'll give you one of them.
128
383000
2000
我們來看看其中一個面向,
06:50
So imagine想像 -- let's suppose假設 we're doing it.
129
385000
2000
先想像一下我們在投擲銅板,
06:52
On this side -- remember記得, you're excited興奮 about head-tail-tail頭 - 尾 - 尾;
130
387000
2000
記住,這一邊是支持「人頭、字、字」的,
06:54
you're excited興奮 about head-tail-head頭 - 尾 - 頭.
131
389000
2000
這一邊是支持「人頭、字、人頭」的。
06:56
We start開始 tossing折騰 a coin硬幣, and we get a head --
132
391000
3000
我們來開始投吧!我們得到一個人頭,
06:59
and you start開始 sitting坐在 on the edge邊緣 of your seat座位
133
394000
1000
你緊張得坐不住了吧?
07:00
because something great and wonderful精彩, or awesome真棒, might威力 be about to happen發生.
134
395000
5000
因為有件很神奇的事情就要發生了!
07:05
The next下一個 toss折騰 is a tail尾巴 -- you get really excited興奮.
135
400000
2000
接下來投出一個字,你真的很興奮,
07:07
The champagne's香檳的 on ice just next下一個 to you; you've got the glasses眼鏡 chilled to celebrate慶祝.
136
402000
4000
似乎看到冰桶裡的香檳就在你身邊,只要拿起杯子就可以慶祝了!
07:11
You're waiting等候 with bated屏息以待 breath呼吸 for the final最後 toss折騰.
137
406000
2000
你現在不敢大口呼吸,
07:13
And if it comes down a head, that's great.
138
408000
2000
如果最後出現一個人頭,那就太棒了!
07:15
You're doneDONE, and you celebrate慶祝.
139
410000
2000
你成功了!你可以慶祝了!
07:17
If it's a tail尾巴 -- well, rather disappointedly失望, you put the glasses眼鏡 away
140
412000
2000
但如果是字,嗯,你會很失望,只好把杯子放回去,
07:19
and put the champagne香檳酒 back.
141
414000
2000
把香檳退掉,
07:21
And you keep tossing折騰, to wait for the next下一個 head, to get excited興奮.
142
416000
3000
然後繼續投擲,等待下一個人頭出現。
07:25
On this side, there's a different不同 experience經驗.
143
420000
2000
而這一邊,則是完全不同的際遇,
07:27
It's the same相同 for the first two parts部分 of the sequence序列.
144
422000
3000
頭二次投擲的結果都一樣,
07:30
You're a little bit excited興奮 with the first head --
145
425000
2000
你對出現第一個人頭很興奮,
07:32
you get rather more excited興奮 with the next下一個 tail尾巴.
146
427000
2000
接下來出現一個字讓你更加興奮,
07:34
Then you toss折騰 the coin硬幣.
147
429000
2000
最後,你再投一次,
07:36
If it's a tail尾巴, you crack裂紋 open打開 the champagne香檳酒.
148
431000
3000
如果是字,你就開香檳慶祝,
07:39
If it's a head you're disappointed失望,
149
434000
2000
如果是人頭,你就會很失望,
07:41
but you're still a third第三 of the way to your pattern模式 again.
150
436000
3000
但你至少不用再等下一個人頭,因為你已經投出下一輪的第一個人頭了。
07:44
And that's an informal非正式的 way of presenting呈現 it -- that's why there's a difference區別.
151
439000
4000
這不是正規的解釋方法,但這確實是他們之間的差異所在。
07:48
Another另一個 way of thinking思維 about it --
152
443000
2000
現在我用另一個思考面向來解釋,
07:50
if we tossed a coin硬幣 eight million百萬 times,
153
445000
2000
如果我們投擲八百萬次,
07:52
then we'd星期三 expect期望 a million百萬 head-tail-heads頭 - 尾 - 頭
154
447000
2000
「人頭、字、人頭」應該會出現一百萬次,
07:54
and a million百萬 head-tail-tails頭 - 尾 - 尾 -- but the head-tail-heads頭 - 尾 - 頭 could occur發生 in clumps團塊.
155
449000
7000
「人頭、字、字」也應該會出現一百萬次,但是「人頭、字、人頭」卻會成群地出現。
08:01
So if you want to put a million百萬 things down amongst其中包括 eight million百萬 positions位置
156
456000
2000
如果你要把一百萬件東西分散放在八百萬件東西裡面,
08:03
and you can have some of them overlapping重疊, the clumps團塊 will be further進一步 apart距離.
157
458000
5000
而某些東西是可以重疊的話,群集間的距離會更遠,
08:08
It's another另一個 way of getting得到 the intuition直覺.
158
463000
2000
這就是另一種思考方式。
08:10
What's the point I want to make?
159
465000
2000
我到底想要說什麼?
08:12
It's a very, very simple簡單 example, an easily容易 stated聲明 question in probability可能性,
160
467000
4000
這是一個非常淺顯易懂的例子,很容易說明的機率問題,
08:16
which哪一個 every一切 -- you're in good company公司 -- everybody每個人 gets得到 wrong錯誤.
161
471000
3000
每一個人都會在這問題上犯錯,你們也不例外。
08:19
This is my little diversion導流 into my real真實 passion, which哪一個 is genetics遺傳學.
162
474000
4000
這是我的另一個嗜好,基因。
08:23
There's a connection連接 between之間 head-tail-heads頭 - 尾 - 頭 and head-tail-tails頭 - 尾 - 尾 in genetics遺傳學,
163
478000
3000
「人頭、字、人頭」或「人頭、字、字」
08:26
and it's the following以下.
164
481000
3000
和基因有某種關聯,
08:29
When you toss折騰 a coin硬幣, you get a sequence序列 of heads and tails尾巴.
165
484000
3000
當你投擲一個銅板,你會丟出一連串的人頭或字,
08:32
When you look at DNA脫氧核糖核酸, there's a sequence序列 of not two things -- heads and tails尾巴 --
166
487000
3000
而我們來看看DNA,它的組成就不是人頭或字,
08:35
but four letters -- As, GsGS, Cs and TsTS.
167
490000
3000
而是這四個字母:A, G, C, T。
08:38
And there are little chemical化學 scissors剪刀, called restriction限制 enzymes
168
493000
3000
有一種像是剪刀的化學成份,叫做限制酶,
08:41
which哪一個 cut DNA脫氧核糖核酸 whenever每當 they see particular特定 patterns模式.
169
496000
2000
會在他們看到某種特定順序組合出現時,將DNA切斷,
08:43
And they're an enormously巨大 useful有用 tool工具 in modern現代 molecular分子 biology生物學.
170
498000
4000
這是現代分子生物學裡的一項強大工具。
08:48
And instead代替 of asking the question, "How long until直到 I see a head-tail-head頭 - 尾 - 頭?" --
171
503000
3000
除了問說:「多久才會看到一個人頭、字、人頭呢?」
08:51
you can ask, "How big will the chunks be when I use a restriction限制 enzyme
172
506000
3000
你還可以問:「若限制酶在看到G-A-A-G出現時就切斷DNA,
08:54
which哪一個 cuts削減 whenever每當 it sees看到 G-A-A-GGAAG, for example?
173
509000
4000
那麼G-A-A-G出現前的那一段DNA
08:58
How long will those chunks be?"
174
513000
2000
會有多長呢?」
09:00
That's a rather trivial不重要的 connection連接 between之間 probability可能性 and genetics遺傳學.
175
515000
5000
這是機率與基因間淺顯的關聯性,
09:05
There's a much deeper更深 connection連接, which哪一個 I don't have time to go into
176
520000
3000
但他們之間還存在著很深的關係,今天我沒有足夠的時間可以說明,
09:08
and that is that modern現代 genetics遺傳學 is a really exciting扣人心弦 area of science科學.
177
523000
3000
但那卻是現代基因學最令人著迷之處,
09:11
And we'll hear some talks會談 later後來 in the conference會議 specifically特別 about that.
178
526000
4000
待會兒還會有其他講者就這個主題再詳細說明。
09:15
But it turns out that unlocking解鎖 the secrets秘密 in the information信息 generated產生 by modern現代
179
530000
4000
我們發現,若要公開現代實驗科技產生的資訊的祕密,
09:19
experimental試驗 technologies技術, a key part部分 of that has to do with fairly相當 sophisticated複雜的 --
180
534000
5000
就不得不提到一個很複雜的關鍵因素,
09:24
you'll你會 be relieved安心 to know that I do something useful有用 in my day job工作,
181
539000
3000
各位會很高興知道我的工作還是有些用途的,
09:27
rather more sophisticated複雜的 than the head-tail-head頭 - 尾 - 頭 story故事 --
182
542000
2000
這可比丟銅板複雜多了,
09:29
but quite相當 sophisticated複雜的 computer電腦 modelingsmodelings and mathematical數學的 modelingsmodelings
183
544000
4000
牽涉到複雜的電腦模型、數學模型
09:33
and modern現代 statistical統計 techniques技術.
184
548000
2000
和現代的統計技巧。
09:35
And I will give you two little snippets片段 -- two examples例子 --
185
550000
3000
我會給各位二個提示,也就是二個例子,
09:38
of projects項目 we're involved參與 in in my group in Oxford牛津,
186
553000
3000
那是我在牛津的小組所參與的專案,
09:41
both of which哪一個 I think are rather exciting扣人心弦.
187
556000
2000
這二個專案都很有趣。
09:43
You know about the Human人的 Genome基因組 Project項目.
188
558000
2000
各位都知道人體基因元計畫,
09:45
That was a project項目 which哪一個 aimed針對 to read one copy複製 of the human人的 genome基因組.
189
560000
4000
這個專案的目標是要訂出人體的基因序列,
09:51
The natural自然 thing to do after you've doneDONE that --
190
566000
2000
而接下來很自然就產生另一個專案,
09:53
and that's what this project項目, the International國際 HapMap單體型圖 Project項目,
191
568000
2000
叫做國際單體型測繪計畫,
09:55
which哪一個 is a collaboration合作 between之間 labs實驗室 in five or six different不同 countries國家.
192
570000
5000
由五、六個不同國家的實驗室共同合作執行。
10:00
Think of the Human人的 Genome基因組 Project項目 as learning學習 what we've我們已經 got in common共同,
193
575000
4000
人體基因計畫旨在瞭解人類基因的共通性,
10:04
and the HapMap單體型圖 Project項目 is trying to understand理解
194
579000
2000
而國際單體型測繪計畫就是要去瞭解
10:06
where there are differences分歧 between之間 different不同 people.
195
581000
2000
不同人之間的基因有何相異之處。
10:08
Why do we care關心 about that?
196
583000
2000
為什麼我們要知道這些?
10:10
Well, there are lots of reasons原因.
197
585000
2000
嗯,有許多原因,
10:12
The most pressing緊迫 one is that we want to understand理解 how some differences分歧
198
587000
4000
最主要的原因是我們想要瞭解,為何基因的不同
10:16
make some people susceptible易感 to one disease疾病 -- type-類型-2 diabetes糖尿病, for example --
199
591000
4000
會使某些人容易得某種疾病,例如第二型糖尿病,
10:20
and other differences分歧 make people more susceptible易感 to heart disease疾病,
200
595000
5000
而另一種基因的差異則會讓人容易產生心臟病,
10:25
or stroke行程, or autism自閉症 and so on.
201
600000
2000
或是中風、自閉症等疾病。
10:27
That's one big project項目.
202
602000
2000
這是一項大型專案,
10:29
There's a second第二 big project項目,
203
604000
2000
還有另一項大型專案,
10:31
recently最近 funded資助 by the Wellcome惠康 Trust相信 in this country國家,
204
606000
2000
是由英國的衛爾康基金會出資運作,
10:33
involving涉及 very large studies學習 --
205
608000
2000
要進行非常大規模的研究,
10:35
thousands數千 of individuals個人, with each of eight different不同 diseases疾病,
206
610000
3000
針對數千人進行調查,主要研究八種不同的疾病,
10:38
common共同 diseases疾病 like type-類型-1 and type-類型-2 diabetes糖尿病, and coronary冠狀動脈 heart disease疾病,
207
613000
4000
像是第一型與第二型糖尿病、冠狀動脈心臟病、
10:42
bipolar雙極 disease疾病 and so on -- to try and understand理解 the genetics遺傳學.
208
617000
4000
躁鬱症等,要研究病患的基因序列,
10:46
To try and understand理解 what it is about genetic遺傳 differences分歧 that causes原因 the diseases疾病.
209
621000
3000
試圖找出病患的基因有何不同之處。
10:49
Why do we want to do that?
210
624000
2000
為什麼要做這個研究?
10:51
Because we understand理解 very little about most human人的 diseases疾病.
211
626000
3000
因為我們對於大部分的疾病都瞭解不多,
10:54
We don't know what causes原因 them.
212
629000
2000
我們不知道人們是怎麼染病的,
10:56
And if we can get in at the bottom底部 and understand理解 the genetics遺傳學,
213
631000
2000
但如果我們能知道最基本的基因差異,
10:58
we'll have a window窗口 on the way the disease疾病 works作品,
214
633000
3000
我們或許可一窺疾病運作之祕密,
11:01
and a whole整個 new way about thinking思維 about disease疾病 therapies治療
215
636000
2000
並找出治療疾病的全新方法,
11:03
and preventative預防 treatment治療 and so on.
216
638000
3000
加以預防。
11:06
So that's, as I said, the little diversion導流 on my main主要 love.
217
641000
3000
這就是我所說的我的第二個嗜好。
11:09
Back to some of the more mundane平凡 issues問題 of thinking思維 about uncertainty不確定.
218
644000
5000
現在我們回歸到現實面,來看看剛才我所說的不確定性,
11:14
Here's這裡的 another另一個 quiz測驗 for you --
219
649000
2000
我要問各位另一個問題,
11:16
now suppose假設 we've我們已經 got a test測試 for a disease疾病
220
651000
2000
假設我們針對某項疾病研發了某種測試技術,
11:18
which哪一個 isn't infallible萬無一失, but it's pretty漂亮 good.
221
653000
2000
雖然不是萬無一失,但尚稱良好,
11:20
It gets得到 it right 99 percent百分 of the time.
222
655000
3000
大約有99%的準確度。
11:23
And I take one of you, or I take someone有人 off the street,
223
658000
3000
我請在座的一位或是街上隨便找個人,
11:26
and I test測試 them for the disease疾病 in question.
224
661000
2000
來用這種技術檢驗是否得到了這種疾病,
11:28
Let's suppose假設 there's a test測試 for HIVHIV -- the virus病毒 that causes原因 AIDS艾滋病 --
225
663000
4000
假設是HIV病毒的檢驗試劑好了,就是愛滋病毒的檢驗試劑,
11:32
and the test測試 says the person has the disease疾病.
226
667000
3000
報告出來說這個人得病了。
11:35
What's the chance機會 that they do?
227
670000
3000
那麼這個人真正得病的機率是多少?
11:38
The test測試 gets得到 it right 99 percent百分 of the time.
228
673000
2000
試劑有99%的準確度,
11:40
So a natural自然 answer回答 is 99 percent百分.
229
675000
4000
大家自然會說這個人99%得了愛滋病,
11:44
Who likes喜歡 that answer回答?
230
679000
2000
但誰會滿意這種答案?
11:46
Come on -- everyone's大家的 got to get involved參與.
231
681000
1000
拜託,每一個人都要參與啊...
11:47
Don't think you don't trust相信 me anymore.
232
682000
2000
不要不信任我嘛...
11:49
(Laughter笑聲)
233
684000
1000
(笑聲)
11:50
Well, you're right to be a bit skeptical懷疑的, because that's not the answer回答.
234
685000
3000
抱持懷疑態度是對的,因為這個答案不對,
11:53
That's what you might威力 think.
235
688000
2000
你一定會這樣想。
11:55
It's not the answer回答, and it's not because it's only part部分 of the story故事.
236
690000
3000
這個答案不對,但不是因為這個原因,
11:58
It actually其實 depends依靠 on how common共同 or how rare罕見 the disease疾病 is.
237
693000
3000
而是要看這種疾病的普遍程度來決定,
12:01
So let me try and illustrate說明 that.
238
696000
2000
我來為各位解說一下。
12:03
Here's這裡的 a little caricature漫畫 of a million百萬 individuals個人.
239
698000
4000
假設這裡有一百萬人,
12:07
So let's think about a disease疾病 that affects影響 --
240
702000
3000
我們來假設一種很罕見的疾病,
12:10
it's pretty漂亮 rare罕見, it affects影響 one person in 10,000.
241
705000
2000
得病機率只有萬分之一,
12:12
Amongst當中 these million百萬 individuals個人, most of them are healthy健康
242
707000
3000
所以在這一百萬人裡,大部分的人都是健康的,
12:15
and some of them will have the disease疾病.
243
710000
2000
只有少數人會得病。
12:17
And in fact事實, if this is the prevalence流行 of the disease疾病,
244
712000
3000
如果這種疾病流行起來,
12:20
about 100 will have the disease疾病 and the rest休息 won't慣於.
245
715000
3000
也只有100個人會生病,其餘的人則不會生病。
12:23
So now suppose假設 we test測試 them all.
246
718000
2000
假設我們對全部的人做檢驗,
12:25
What happens發生?
247
720000
2000
會有什麼結果?
12:27
Well, amongst其中包括 the 100 who do have the disease疾病,
248
722000
2000
在這100個得病的人裡,
12:29
the test測試 will get it right 99 percent百分 of the time, and 99 will test測試 positive.
249
724000
5000
以這99%準確度的試劑來檢驗,會有99個人呈陽性反應,
12:34
Amongst當中 all these other people who don't have the disease疾病,
250
729000
2000
而在其他沒有得病的人裡,
12:36
the test測試 will get it right 99 percent百分 of the time.
251
731000
3000
這個試劑的準確度還是99%,
12:39
It'll它會 only get it wrong錯誤 one percent百分 of the time.
252
734000
2000
有1%的機會會出錯,
12:41
But there are so many許多 of them that there'll有會 be an enormous巨大 number of false positives陽性.
253
736000
4000
但因為人數很多,所以假陽性的數量也就跟著變多。
12:45
Put that another另一個 way --
254
740000
2000
換個方式來說,
12:47
of all of them who test測試 positive -- so here they are, the individuals個人 involved參與 --
255
742000
5000
在所有呈陽性反應的人裡,
12:52
less than one in 100 actually其實 have the disease疾病.
256
747000
5000
100個人裡只有不到一個人是真正染病的。
12:57
So even though雖然 we think the test測試 is accurate準確, the important重要 part部分 of the story故事 is
257
752000
4000
即使我們認為這種試劑很準確,
13:01
there's another另一個 bit of information信息 we need.
258
756000
3000
但重點是我們還需要其他資訊來確認,
13:04
Here's這裡的 the key intuition直覺.
259
759000
2000
我們需要敏銳的洞察力。
13:07
What we have to do, once一旦 we know the test測試 is positive,
260
762000
3000
一旦我們發現有人呈陽性反應,
13:10
is to weigh稱重 up the plausibility合理性, or the likelihood可能性, of two competing競爭 explanations說明.
261
765000
6000
我們就該去權衡二種不同解釋之間的可信度或可能性,
13:16
Each of those explanations說明 has a likely容易 bit and an unlikely不會 bit.
262
771000
3000
每一種解釋都有可能的一面,也有不可能的一面。
13:19
One explanation說明 is that the person doesn't have the disease疾病 --
263
774000
3000
你可以說這個人沒有染病,
13:22
that's overwhelmingly壓倒性 likely容易, if you pick someone有人 at random隨機 --
264
777000
3000
這很有可能,因為你是隨機取樣的,
13:25
but the test測試 gets得到 it wrong錯誤, which哪一個 is unlikely不會.
265
780000
3000
也就是說試劑出錯了,但這種機會不大。
13:29
The other explanation說明 is that the person does have the disease疾病 -- that's unlikely不會 --
266
784000
3000
你也可以說這個人確實是染病了,但這種疾病發生的機率很小,
13:32
but the test測試 gets得到 it right, which哪一個 is likely容易.
267
787000
3000
試劑確實是準確的,這確實很有可能發生。
13:35
And the number we end結束 up with --
268
790000
2000
最後我們得到的數據
13:37
that number which哪一個 is a little bit less than one in 100 --
269
792000
3000
是比1%還稍小一點,
13:40
is to do with how likely容易 one of those explanations說明 is relative相對的 to the other.
270
795000
6000
也就是這二種解釋的發生的比例(幾乎是一比一百),
13:46
Each of them taken採取 together一起 is unlikely不會.
271
801000
2000
二者同時發生的可能性不高。
13:49
Here's這裡的 a more topical局部的 example of exactly究竟 the same相同 thing.
272
804000
3000
這裡還有一個很類似的例子,
13:52
Those of you in Britain英國 will know about what's become成為 rather a celebrated著名 case案件
273
807000
4000
各位住在英國都知道一個很著名的案例,
13:56
of a woman女人 called Sally出擊 Clark克拉克, who had two babies嬰兒 who died死亡 suddenly突然.
274
811000
5000
有個叫做莎莉.克拉克的婦人,她的二個嬰孩同時猝死,
14:01
And initially原來, it was thought that they died死亡 of what's known已知 informally非正式地 as "cot嬰兒床 death死亡,"
275
816000
4000
一開始大家都以為是猝死症,
14:05
and more formally正式地 as "Sudden突然 Infant嬰兒 Death死亡 Syndrome綜合徵."
276
820000
3000
正式名稱為嬰兒猝死症候群。
14:08
For various各個 reasons原因, she was later後來 charged帶電 with murder謀殺.
277
823000
2000
基於許多不同理由,莎莉被控謀殺,
14:10
And at the trial審訊, her trial審訊, a very distinguished傑出的 pediatrician兒科醫師 gave evidence證據
278
825000
4000
而在審判中,一位很知名的小兒科醫生做證說明,
14:14
that the chance機會 of two cot嬰兒床 deaths死亡, innocent無辜 deaths死亡, in a family家庭 like hers她的 --
279
829000
5000
在他們這種家庭裡,也就是專業人士又不抽煙的家庭,
14:19
which哪一個 was professional專業的 and non-smoking禁煙 -- was one in 73 million百萬.
280
834000
6000
二個嬰兒同時猝死的機率大約是7千3百萬分之一。
14:26
To cut a long story故事 short, she was convicted被定罪 at the time.
281
841000
3000
長話短說,她後來被定罪了。
14:29
Later後來, and fairly相當 recently最近, acquitted無罪釋放 on appeal上訴 -- in fact事實, on the second第二 appeal上訴.
282
844000
5000
但是後來,也就是最近的事,她在第二次上訴後獲判無罪。
14:34
And just to set it in context上下文, you can imagine想像 how awful可怕 it is for someone有人
283
849000
4000
請各位想想一下,如果有人失去了一個孩子,
14:38
to have lost丟失 one child兒童, and then two, if they're innocent無辜,
284
853000
3000
或甚至二個孩子,以清白之身卻被判謀殺定罪,
14:41
to be convicted被定罪 of murdering謀殺 them.
285
856000
2000
這是多麼殘忍的一件事。
14:43
To be put through通過 the stress強調 of the trial審訊, convicted被定罪 of murdering謀殺 them --
286
858000
2000
就只為了紓解法庭所承擔的壓力,
14:45
and to spend time in a women's女士的 prison監獄, where all the other prisoners囚犯
287
860000
3000
就把一個人以謀殺犯定罪,把她關進女子監獄,
14:48
think you killed殺害 your children孩子 -- is a really awful可怕 thing to happen發生 to someone有人.
288
863000
5000
那裡的犯人都認為你殺了自己的小孩,這真是一件悲慘絕倫的事。
14:53
And it happened發生 in large part部分 here because the expert專家 got the statistics統計
289
868000
5000
這個錯誤最主要是因為專家在二個不同的方面,
14:58
horribly可怕 wrong錯誤, in two different不同 ways方法.
290
873000
3000
大錯特錯地引用了統計數據所造成。
15:01
So where did he get the one in 73 million百萬 number?
291
876000
4000
他怎麼得出7千3百萬分之一這個數據的?
15:05
He looked看著 at some research研究, which哪一個 said the chance機會 of one cot嬰兒床 death死亡 in a family家庭
292
880000
3000
他看了某些研究文獻,裡頭說像莎莉這種家庭,
15:08
like Sally出擊 Clark's克拉克的 is about one in 8,500.
293
883000
5000
一個嬰孩猝死的機率約為8千5百分之一。
15:13
So he said, "I'll assume承擔 that if you have one cot嬰兒床 death死亡 in a family家庭,
294
888000
4000
他說:「先假設家裡已經有一個嬰孩猝死了,
15:17
the chance機會 of a second第二 child兒童 dying垂死 from cot嬰兒床 death死亡 aren't changed."
295
892000
4000
第二個嬰孩猝死的機率與第一個相同。」
15:21
So that's what statisticians統計學家 would call an assumption假設 of independence獨立.
296
896000
3000
這就是統計學所引用的獨立性假設,
15:24
It's like saying, "If you toss折騰 a coin硬幣 and get a head the first time,
297
899000
2000
就好像是說:「若你第一次丟銅板得到一個人頭,
15:26
that won't慣於 affect影響 the chance機會 of getting得到 a head the second第二 time."
298
901000
3000
並不會影響你第二次再丟銅板,得到人頭的機率。」
15:29
So if you toss折騰 a coin硬幣 twice兩次, the chance機會 of getting得到 a head twice兩次 are a half --
299
904000
5000
所以,如果你丟一個銅板二次,那麼連丟二次都得到人頭的機率,
15:34
that's the chance機會 the first time -- times a half -- the chance機會 a second第二 time.
300
909000
3000
就是第一次丟出銅板的機率,乘上第二次的機率(1/2*1/2)。
15:37
So he said, "Here,
301
912000
2000
所以他才會說:「讓我們假設一下,
15:39
I'll assume承擔 that these events事件 are independent獨立.
302
914000
4000
假設這二個事件是獨立的,
15:43
When you multiply 8,500 together一起 twice兩次,
303
918000
2000
將8千5百乘二次,
15:45
you get about 73 million百萬."
304
920000
2000
就會得到7千3百萬。」
15:47
And none沒有 of this was stated聲明 to the court法庭 as an assumption假設
305
922000
2000
但是這個前題假設並沒有在法庭上說明,
15:49
or presented呈現 to the jury陪審團 that way.
306
924000
2000
也沒有對陪審團說明。
15:52
Unfortunately不幸 here -- and, really, regrettably很遺憾 --
307
927000
3000
很不幸也很遺憾的是,
15:55
first of all, in a situation情況 like this you'd have to verify校驗 it empirically經驗.
308
930000
4000
首先,像這種情形就該憑經驗先進行驗證,
15:59
And secondly其次, it's palpably顯然 false.
309
934000
2000
第二,這很明顯就是錯的。
16:02
There are lots and lots of things that we don't know about sudden突然 infant嬰兒 deaths死亡.
310
937000
5000
我們對於嬰兒猝死症所知真的不多,
16:07
It might威力 well be that there are environmental環境的 factors因素 that we're not aware知道的 of,
311
942000
3000
有可能是因為某些我們並不瞭解的環境因素所造成,
16:10
and it's pretty漂亮 likely容易 to be the case案件 that there are
312
945000
2000
而這個個案更有可能是因為
16:12
genetic遺傳 factors因素 we're not aware知道的 of.
313
947000
2000
我們所不知道的基因缺陷所造成,
16:14
So if a family家庭 suffers患有 from one cot嬰兒床 death死亡, you'd put them in a high-risk高風險 group.
314
949000
3000
所以當某個家庭裡有一個嬰孩猝死時,他們就算是高風險的家庭,
16:17
They've他們已經 probably大概 got these environmental環境的 risk風險 factors因素
315
952000
2000
有可能存在著某些環境風險因子,
16:19
and/or genetic遺傳 risk風險 factors因素 we don't know about.
316
954000
3000
或是有我們不知道的基因缺陷,或是二者都有。
16:22
And to argue爭論, then, that the chance機會 of a second第二 death死亡 is as if you didn't know
317
957000
3000
真要計較起來,若完全不考慮這些因素,
16:25
that information信息 is really silly愚蠢.
318
960000
3000
就來計算第二個嬰孩的猝死機率,是很可笑的。
16:28
It's worse更差 than silly愚蠢 -- it's really bad science科學.
319
963000
4000
甚至比可笑還糟,簡直就是爛透了的科學證據。
16:32
Nonetheless儘管如此, that's how it was presented呈現, and at trial審訊 nobody沒有人 even argued爭論 it.
320
967000
5000
但這個數據就這樣被當成呈堂證供,法庭上也沒有人懷疑,
16:37
That's the first problem問題.
321
972000
2000
這就是第一個問題。
16:39
The second第二 problem問題 is, what does the number of one in 73 million百萬 mean?
322
974000
4000
第二個問題是,7千3百萬分之一代表著什麼?
16:43
So after Sally出擊 Clark克拉克 was convicted被定罪 --
323
978000
2000
當莎拉.克拉克被定罪之後,
16:45
you can imagine想像, it made製作 rather a splash in the press --
324
980000
4000
你可以想見又在媒體上掀起了多大的波瀾,
16:49
one of the journalists記者 from one of Britain's英國的 more reputable信譽 newspapers報紙 wrote that
325
984000
7000
英國某家聲譽卓著的報社記者
16:56
what the expert專家 had said was,
326
991000
2000
就引用專家的話說:
16:58
"The chance機會 that she was innocent無辜 was one in 73 million百萬."
327
993000
5000
「莎拉清白的機率是7千3百萬之一」
17:03
Now, that's a logical合乎邏輯 error錯誤.
328
998000
2000
這犯了邏輯上的錯誤,
17:05
It's exactly究竟 the same相同 logical合乎邏輯 error錯誤 as the logical合乎邏輯 error錯誤 of thinking思維 that
329
1000000
3000
這個錯誤就和我們剛才所談到的疾病測試一樣,
17:08
after the disease疾病 test測試, which哪一個 is 99 percent百分 accurate準確,
330
1003000
2000
同樣具有邏輯上的錯誤,有人會以為試劑有99%的準確度,
17:10
the chance機會 of having the disease疾病 is 99 percent百分.
331
1005000
4000
得到這種疾病的機率就是99%。
17:14
In the disease疾病 example, we had to bear in mind心神 two things,
332
1009000
4000
在疾病試劑的例子裡,我們得記住二件事,
17:18
one of which哪一個 was the possibility可能性 that the test測試 got it right or not.
333
1013000
4000
其中之一是試劑的準確度,
17:22
And the other one was the chance機會, a priori先驗, that the person had the disease疾病 or not.
334
1017000
4000
另一個則是人們染病的先驗機率。
17:26
It's exactly究竟 the same相同 in this context上下文.
335
1021000
3000
這和這個案子是一樣的情形,
17:29
There are two things involved參與 -- two parts部分 to the explanation說明.
336
1024000
4000
這個案子也有二種解釋的方向,
17:33
We want to know how likely容易, or relatively相對 how likely容易, two different不同 explanations說明 are.
337
1028000
4000
我們得釐清這二種解釋發生的機率。
17:37
One of them is that Sally出擊 Clark克拉克 was innocent無辜 --
338
1032000
3000
第一種解釋是莎拉是清白的,
17:40
which哪一個 is, a priori先驗, overwhelmingly壓倒性 likely容易 --
339
1035000
2000
這在先驗機率上是很有可能的,
17:42
most mothers母親 don't kill their children孩子.
340
1037000
3000
大部分的母親都不會殺害自己的小孩。
17:45
And the second第二 part部分 of the explanation說明
341
1040000
2000
這種解釋的第二個部分是,
17:47
is that she suffered遭遇 an incredibly令人難以置信 unlikely不會 event事件.
342
1042000
3000
莎拉的遭遇真的是令人難以置信,
17:50
Not as unlikely不會 as one in 73 million百萬, but nonetheless儘管如此, rather unlikely不會.
343
1045000
4000
雖然機率不像7千3百萬分之一那麼小,但確實是不太可能。
17:54
The other explanation說明 is that she was guilty有罪.
344
1049000
2000
第二種解釋是莎拉確實是有罪的,
17:56
Now, we probably大概 think a priori先驗 that's unlikely不會.
345
1051000
2000
就先驗機率來說,這不太可能,
17:58
And we certainly當然 should think in the context上下文 of a criminal刑事 trial審訊
346
1053000
3000
而且我們當然認為在這起犯罪的審判中,
18:01
that that's unlikely不會, because of the presumption假定 of innocence無辜.
347
1056000
3000
一開始就要假設被告是無罪的,所以說莎拉有罪並不太可能。
18:04
And then if she were trying to kill the children孩子, she succeeded成功.
348
1059000
4000
但若她真的想要殺害小孩,她也成功了,
18:08
So the chance機會 that she's innocent無辜 isn't one in 73 million百萬.
349
1063000
4000
所以她是清白的機率就不是7千3百萬分之一,
18:12
We don't know what it is.
350
1067000
2000
沒人知道是多少,
18:14
It has to do with weighing稱重 up the strength強度 of the other evidence證據 against反對 her
351
1069000
4000
這個機率反而是和其他對她不利的證據和統計數據有關,
18:18
and the statistical統計 evidence證據.
352
1073000
2000
得視證據強度而定。
18:20
We know the children孩子 died死亡.
353
1075000
2000
我們只知道嬰孩死了,
18:22
What matters事項 is how likely容易 or unlikely不會, relative相對的 to each other,
354
1077000
4000
重要的是要找出這二種解釋
18:26
the two explanations說明 are.
355
1081000
2000
之間的關聯性。
18:28
And they're both implausible難以置信.
356
1083000
2000
這二種解釋都無法使人信服,
18:31
There's a situation情況 where errors錯誤 in statistics統計 had really profound深刻
357
1086000
4000
有時統計上的錯誤所造成的影響,
18:35
and really unfortunate不幸的 consequences後果.
358
1090000
3000
是很深遠且會造成不幸的。
18:38
In fact事實, there are two other women婦女 who were convicted被定罪 on the basis基礎 of the
359
1093000
2000
事實上,還有有二位婦女因為這位小兒科醫生的證詞,
18:40
evidence證據 of this pediatrician兒科醫師, who have subsequently後來 been released發布 on appeal上訴.
360
1095000
4000
而被判有罪,但在後來的上訴後又被無罪釋放。
18:44
Many許多 cases were reviewed回顧.
361
1099000
2000
以往許多案子又被大家拿出來討論,
18:46
And it's particularly尤其 topical局部的 because he's currently目前 facing面對 a disrepute蒙羞 charge收費
362
1101000
4000
因此又掀起一波話題,因為這個醫生正被英國醫藥委員會
18:50
at Britain's英國的 General一般 Medical Council評議會.
363
1105000
3000
控以不名譽的罪名。
18:53
So just to conclude得出結論 -- what are the take-home帶回家 messages消息 from this?
364
1108000
4000
結論是,這個故事帶給我們什麼樣的啟示?
18:57
Well, we know that randomness隨機性 and uncertainty不確定 and chance機會
365
1112000
4000
我們知道隨機、不確定性及機率等,
19:01
are very much a part部分 of our everyday每天 life.
366
1116000
3000
都是我們日常生活的一部分,
19:04
It's also true真正 -- and, although雖然, you, as a collective集體, are very special特別 in many許多 ways方法,
367
1119000
5000
而雖然我們每一個人都與眾不同,
19:09
you're completely全然 typical典型 in not getting得到 the examples例子 I gave right.
368
1124000
4000
但就我所提出的問題沒有做出正確的回答這件事,這也是常態
19:13
It's very well documented記錄 that people get things wrong錯誤.
369
1128000
3000
很多過去的記錄顯示人們確實有時會做出錯誤判斷。
19:16
They make errors錯誤 of logic邏輯 in reasoning推理 with uncertainty不確定.
370
1131000
3000
在不確定的情況下,人們會犯下合理的邏輯錯誤。
19:20
We can cope應付 with the subtleties細微之處 of language語言 brilliantly出色 --
371
1135000
2000
人類可以運用精巧的語言,
19:22
and there are interesting有趣 evolutionary發展的 questions問題 about how we got here.
372
1137000
3000
也能對人類本身的進化提出有趣的問題,
19:25
We are not good at reasoning推理 with uncertainty不確定.
373
1140000
3000
但我們就是不擅長預測不確定性,
19:28
That's an issue問題 in our everyday每天 lives生活.
374
1143000
2000
這是我們每天都必須面對的問題。
19:30
As you've heard聽說 from many許多 of the talks會談, statistics統計 underpins鞏固 an enormous巨大 amount
375
1145000
3000
如同其他講者所提到的,統計學是其他許多科學研究的基礎,
19:33
of research研究 in science科學 -- in social社會 science科學, in medicine醫學
376
1148000
3000
不管是社會科學還是醫學都一樣,
19:36
and indeed確實, quite相當 a lot of industry行業.
377
1151000
2000
還包括大部分的工業,
19:38
All of quality質量 control控制, which哪一個 has had a major重大的 impact碰撞 on industrial產業 processing處理,
378
1153000
4000
那些品質控制理論,對於工業流程管制具有重大的影響,
19:42
is underpinned支撐 by statistics統計.
379
1157000
2000
都是靠統計學做基礎。
19:44
It's something we're bad at doing.
380
1159000
2000
但這卻是我們所不擅長的事,
19:46
At the very least最小, we should recognize認識 that, and we tend趨向 not to.
381
1161000
3000
至少我們該承認這一點,但我們卻沒人願意承認。
19:49
To go back to the legal法律 context上下文, at the Sally出擊 Clark克拉克 trial審訊
382
1164000
4000
回到法律層面,回到莎拉的案子上,
19:53
all of the lawyers律師 just accepted公認 what the expert專家 said.
383
1168000
4000
所有的律師都接受這位專家的說法,
19:57
So if a pediatrician兒科醫師 had come out and said to a jury陪審團,
384
1172000
2000
所以如果有一位小兒科醫生站出來對陪審團說,
19:59
"I know how to build建立 bridges橋樑. I've built內置 one down the road.
385
1174000
3000
「我知道如何建造橋樑,我已經在這條路上蓋了一座橋,
20:02
Please drive駕駛 your car汽車 home over it,"
386
1177000
2000
請把你的車開上橋回家吧!」
20:04
they would have said, "Well, pediatricians兒科醫生 don't know how to build建立 bridges橋樑.
387
1179000
2000
陪審團會說:「小兒科醫生不是建造橋樑的專家,
20:06
That's what engineers工程師 do."
388
1181000
2000
這是工程師該做的事。」
20:08
On the other hand, he came來了 out and effectively有效 said, or implied默示,
389
1183000
3000
而在另一方面,這位醫師卻站出來發表專業意見,甚至暗示:
20:11
"I know how to reason原因 with uncertainty不確定. I know how to do statistics統計."
390
1186000
3000
「我知道如何解釋不確定性,我瞭解統計方法。」
20:14
And everyone大家 said, "Well, that's fine. He's an expert專家."
391
1189000
3000
然後大家附和:「對,他是專家。」
20:17
So we need to understand理解 where our competence權限 is and isn't.
392
1192000
3000
我們必須瞭解每一個人的專長為何,
20:20
Exactly究竟 the same相同 kinds of issues問題 arose出現 in the early days of DNA脫氧核糖核酸 profiling剖析,
393
1195000
4000
就像早期我們在描繪DNA時所引發的爭議一樣,
20:24
when scientists科學家們, and lawyers律師 and in some cases judges法官,
394
1199000
4000
有些科學家、律師,或甚至法官,
20:28
routinely常規 misrepresented歪曲 evidence證據.
395
1203000
3000
都曾不斷地錯誤解讀他們所看到的證據。
20:32
Usually平時 -- one hopes希望 -- innocently傻傻, but misrepresented歪曲 evidence證據.
396
1207000
3000
他們通常不是故意的,我們也衷心希望不是,但卻還是扭曲了證據的本質。
20:35
Forensic法庭的 scientists科學家們 said, "The chance機會 that this guy's傢伙 innocent無辜 is one in three million百萬."
397
1210000
5000
鑑識專家說:「這傢伙清白的機率是三百萬分之一。」
20:40
Even if you believe the number, just like the 73 million百萬 to one,
398
1215000
2000
即使各位相信這個數據,就像先前提到的7千3百萬分之一那樣,
20:42
that's not what it meant意味著.
399
1217000
2000
但這數據的意義並非如此,
20:44
And there have been celebrated著名 appeal上訴 cases
400
1219000
2000
在英國和其他地方,
20:46
in Britain英國 and elsewhere別處 because of that.
401
1221000
2000
都有因為誤解數據而誤判的有名案例。
20:48
And just to finish in the context上下文 of the legal法律 system系統.
402
1223000
3000
再讓我們回過頭來看看我們的法庭,
20:51
It's all very well to say, "Let's do our best最好 to present當下 the evidence證據."
403
1226000
4000
你大可以說:「我們得盡力將證據的原貌呈現出來。」
20:55
But more and more, in cases of DNA脫氧核糖核酸 profiling剖析 -- this is another另一個 one --
404
1230000
3000
但是在DNA描繪的案例裡,一次又一次我們看到,這是另一個案例,
20:58
we expect期望 juries陪審團, who are ordinary普通 people --
405
1233000
3000
我們期望陪審團這些一般大眾,
21:01
and it's documented記錄 they're very bad at this --
406
1236000
2000
這些本來就對統計不甚在行在大眾,
21:03
we expect期望 juries陪審團 to be able能夠 to cope應付 with the sorts排序 of reasoning推理 that goes on.
407
1238000
4000
我們竟然期望他們能解讀這些統計數據。
21:07
In other spheres of life, if people argued爭論 -- well, except possibly或者 for politics政治 --
408
1242000
5000
但在現實生活裡,如果有人爭論...嗯,除了政治話題之外,
21:12
but in other spheres of life, if people argued爭論 illogically不合邏輯,
409
1247000
2000
在現實生活裡,如果有人不合邏輯地爭論,
21:14
we'd星期三 say that's not a good thing.
410
1249000
2000
我們會說這樣做不好,
21:16
We sort分類 of expect期望 it of politicians政治家 and don't hope希望 for much more.
411
1251000
4000
我們會認為這是政客做的事,因為我們對政客沒什麽太大的期望。
21:20
In the case案件 of uncertainty不確定, we get it wrong錯誤 all the time --
412
1255000
3000
在面對不確定的事情時,我們總是犯錯,
21:23
and at the very least最小, we should be aware知道的 of that,
413
1258000
2000
但是至少我們應該知道我們會犯錯。
21:25
and ideally理想, we might威力 try and do something about it.
414
1260000
2000
並希望我們能嘗試去減少錯誤的發生。
21:27
Thanks謝謝 very much.
415
1262000
1000
謝謝各位!
Translated by Marie Wu
Reviewed by Wang-Ju Tsai

▲Back to top

ABOUT THE SPEAKER
Peter Donnelly - Mathematician; statistician
Peter Donnelly is an expert in probability theory who applies statistical methods to genetic data -- spurring advances in disease treatment and insight on our evolution. He's also an expert on DNA analysis, and an advocate for sensible statistical analysis in the courtroom.

Why you should listen

Peter Donnelly applies statistical methods to real-world problems, ranging from DNA analysis (for criminal trials), to the treatment of genetic disorders. A mathematician who collaborates with biologists, he specializes in applying probability and statistics to the field of genetics, in hopes of shedding light on evolutionary history and the structure of the human genome.

The Australian-born, Oxford-based mathematician is best known for his work in molecular evolution (tracing the roots of human existence to their earliest origins using the mutation rates of mitochondrial DNA). He studies genetic distributions in living populations to trace human evolutionary history -- an approach that informs research in evolutionary biology, as well as medical treatment for genetic disorders. Donnelly is a key player in the International HapMap Project, an ongoing international effort to model human genetic variation and pinpoint the genes responsible for specific aspects of health and disease; its implications for disease prevention and treatment are vast.

He's also a leading expert on DNA analysis and the use of forensic science in criminal trials; he's an outspoken advocate for bringing sensible statistical analysis into the courtroom. Donnelly leads Oxford University's Mathematical Genetics Group, which conducts research in genetic modeling, human evolutionary history, and forensic DNA profiling. He is also serves as Director of the Wellcome Trust Centre for Human Genetics at Oxford University, which explores the genetic relationships to disease and illness. 

More profile about the speaker
Peter Donnelly | Speaker | TED.com