ABOUT THE SPEAKER
Sebastian Wernicke - Data scientist
After making a splash in the field of bioinformatics, Sebastian Wernicke moved on to the corporate sphere, where he motivates and manages multidimensional projects.

Why you should listen

Dr. Sebastian Wernicke is the Chief Data Scientist of ONE LOGIC, a data science boutique that supports organizations across industries to make sense of their vast data collections to improve operations and gain strategic advantages. Wernicke originally studied bioinformatics and previously led the strategy and growth of Seven Bridges Genomics, a Cambridge-based startup that builds platforms for genetic analysis.

Before his career in statistics began, Wernicke worked stints as both a paramedic and successful short animated filmmaker. He's also the author of the TEDPad app, an irreverent tool for creating an infinite number of "amazing and really bad" and mostly completely meaningless talks. He's the author of the statistically authoritative and yet completely ridiculous "How to Give the Perfect TEDTalk."

More profile about the speaker
Sebastian Wernicke | Speaker | TED.com
TEDxCambridge

Sebastian Wernicke: How to use data to make a hit TV show

塞巴斯蒂安 • 韦尼克: 如何运用数据做出一个爆红的电视节目

Filmed:
1,628,704 views

收集更多的数据会导向更好的决策吗?有竞争力、擅长数据分析的公司,如亚马逊、谷歌和Netflix已经发现数据分析本身并不总能产生最佳的效果。在这次讲座中,数据科学家塞巴斯蒂安 • 韦尼克剖析了当我们纯粹使用数据做决策时会出现什么错误——并建议了一个更明智的方式来使用它。
- Data scientist
After making a splash in the field of bioinformatics, Sebastian Wernicke moved on to the corporate sphere, where he motivates and manages multidimensional projects. Full bio

Double-click the English transcript below to play the video.

00:12
Roy罗伊 Price价钱 is a man that most of you
have probably大概 never heard听说 about,
0
820
4276
Roy Price这个人,
在座的绝大多数可能都没听说过,
即使他曾经在2013年4月19日这一天
00:17
even though虽然 he may可能 have been responsible主管
1
5120
2496
00:19
for 22 somewhat有些 mediocre平庸
minutes分钟 of your life on April四月 19, 2013.
2
7640
6896
占用了你们生命中普通的22分钟。
00:26
He may可能 have also been responsible主管
for 22 very entertaining娱乐 minutes分钟,
3
14560
3176
他也许曾经带给了
各位非常欢乐的22分钟,
00:29
but not very many许多 of you.
4
17760
2256
但对你们当中很多人来说
可能并不是这样。
而这一切全部要追溯到
00:32
And all of that goes back to a decision决定
5
20040
1896
Roy在三年前的一个决定。
00:33
that Roy罗伊 had to make
about three years年份 ago.
6
21960
2000
实际上,Roy Price是
亚马逊广播公司的一位资深决策者。
00:35
So you see, Roy罗伊 Price价钱
is a senior前辈 executive行政人员 with Amazon亚马逊 Studios工作室.
7
23984
4832
这是亚马逊旗下的一家
电视节目制作公司。
00:40
That's the TV电视 production生产
company公司 of Amazon亚马逊.
8
28840
3016
他47岁,身材不错,
短发梳得很有型,
00:43
He's 47 years年份 old, slim, spiky高低不平 hair头发,
9
31880
3256
他在Twitter上形容自己是
"电影、电视、科技、墨西哥卷饼(爱好者)"。
00:47
describes介绍 himself他自己 on Twitter推特
as "movies电影, TV电视, technology技术, tacos玉米饼."
10
35160
4816
Roy Price有一个
责任非常重大的工作,
00:52
And Roy罗伊 Price价钱 has a very responsible主管 job工作,
because it's his responsibility责任
11
40000
5176
因为他要负责帮亚马逊挑选
即将制作的原创节目。
00:57
to pick the shows节目, the original原版的 content内容
that Amazon亚马逊 is going to make.
12
45200
4056
当然,这个领域的竞争非常激烈。
01:01
And of course课程 that's
a highly高度 competitive竞争的 space空间.
13
49280
2336
01:03
I mean, there are so many许多
TV电视 shows节目 already已经 out there,
14
51640
2736
我是说,其他公司已经有
那么多的电视节目,
Roy不能只是随便乱挑一个节目。
01:06
that Roy罗伊 can't just choose选择 any show显示.
15
54400
2176
01:08
He has to find shows节目
that are really, really great.
16
56600
4096
他必须找出真正会走紅的节目。
换句话说,他挑选的节目
01:12
So in other words, he has to find shows节目
17
60720
2816
必须落在这条曲线的右侧。
01:15
that are on the very right end结束
of this curve曲线 here.
18
63560
2376
这条曲线是IMDB
(译注:网络电影资料库)里
01:17
So this curve曲线 here
is the rating评分 distribution分配
19
65960
2656
01:20
of about 2,500 TV电视 shows节目
on the website网站 IMDBIMDB,
20
68640
4376
2500个电视节目的
客户评分曲线图,
评分从1到10,
01:25
and the rating评分 goes from one to 10,
21
73040
2896
纵轴表明有多少节目达到这个评分。
01:27
and the height高度 here shows节目 you
how many许多 shows节目 get that rating评分.
22
75960
2976
01:30
So if your show显示 gets得到 a rating评分
of nine points or higher更高, that's a winner优胜者.
23
78960
4696
所以如果你的节目达到9分或更高,
你就是赢家,
01:35
Then you have a top最佳 two percent百分 show显示.
24
83680
1816
你就拥有那2%的顶尖节目。
例如像是“绝命毒师”、
“权力的游戏”、“火线重案组”,
01:37
That's shows节目 like "Breaking打破 Bad,"
"Game游戏 of Thrones权力," "The Wire线,"
25
85520
3896
全部都是会让人上瘾的节目,
01:41
so all of these shows节目 that are addictive上瘾,
26
89440
2296
01:43
whereafter此后 you've watched看着 a season季节,
your brain is basically基本上 like,
27
91760
3056
看完一季之后,
你基本马上就会想,
“我要去哪里找到剩下的剧集?”
01:46
"Where can I get more of these episodes发作?"
28
94840
2176
01:49
That kind of show显示.
29
97040
1200
基本就是这类的节目。
曲线左边,不妨选个最靠边,
比较明显的点,
01:50
On the left side, just for clarity明晰,
here on that end结束,
30
98920
2496
01:53
you have a show显示 called
"Toddlers幼儿 and Tiaras皇冠" --
31
101440
3176
这儿有个叫“选美小天后"
(译注:儿童选秀类)的节目——
01:56
(Laughter笑声)
32
104640
2656
(笑声)
——足够让你明白
01:59
-- which哪一个 should tell you enough足够
33
107320
1536
曲线最左端代表了什么。
02:00
about what's going on
on that end结束 of the curve曲线.
34
108880
2191
02:03
Now, Roy罗伊 Price价钱 is not worried担心 about
getting得到 on the left end结束 of the curve曲线,
35
111095
4161
现在,Roy Price并不担心
会选个落在曲线最左边的节目,
02:07
because I think you would have to have
some serious严重 brainpower脑力
36
115280
2936
因为我认为你们都具备
严肃的判断力
来给 "选美小天后" 打个低分 。
02:10
to undercut咬边 "Toddlers幼儿 and Tiaras皇冠."
37
118240
1696
02:11
So what he's worried担心 about
is this middle中间 bulge here,
38
119960
3936
他担心的是
中间多数的这些节目,
02:15
the bulge of average平均 TV电视,
39
123920
1816
多到爆的这些一般的电视节目,
02:17
you know, those shows节目
that aren't really good or really bad,
40
125760
2856
这些节目不算好,但也不是很烂,
它们不会真正地让你感兴趣。
02:20
they don't really get you excited兴奋.
41
128639
1656
所以他要确保他看好的节目
是落在最右端这里。
02:22
So he needs需求 to make sure
that he's really on the right end结束 of this.
42
130320
4856
02:27
So the pressure压力 is on,
43
135200
1576
那么压力就来了,
当然,这也是亚马逊第一次
02:28
and of course课程 it's also the first time
44
136800
2176
想要做这类事情,
02:31
that Amazon亚马逊 is even
doing something like this,
45
139000
2176
所以Roy Price不想只是碰运气。
02:33
so Roy罗伊 Price价钱 does not want
to take any chances机会.
46
141200
3336
02:36
He wants to engineer工程师 success成功.
47
144560
2456
他想要打造成功。
他要一个万无一失的成功,
02:39
He needs需求 a guaranteed保证 success成功,
48
147040
1776
于是,他举办了一个竞赛。
02:40
and so what he does is,
he holds持有 a competition竞争.
49
148840
2576
他带来了很多关于电视节目的想法,
02:43
So he takes a bunch of ideas思路 for TV电视 shows节目,
50
151440
3136
通过一个评估,
02:46
and from those ideas思路,
through通过 an evaluation评测,
51
154600
2296
他们挑了八个候选的电视节目,
02:48
they select选择 eight candidates候选人 for TV电视 shows节目,
52
156920
4096
然后他为每一个节目制作了第一集,
02:53
and then he just makes品牌 the first episode插曲
of each one of these shows节目
53
161040
3216
再把它们放到网上,
让每个人免费观看。
02:56
and puts看跌期权 them online线上 for free自由
for everyone大家 to watch.
54
164280
3136
当亚马逊要给你免费的东西时,
02:59
And so when Amazon亚马逊
is giving out free自由 stuff东东,
55
167440
2256
你就会拿,对吧?
03:01
you're going to take it, right?
56
169720
1536
所以几百万人在看这些剧集,
03:03
So millions百万 of viewers观众
are watching观看 those episodes发作.
57
171280
5136
而这些人不知道的是,
当他们在观看节目的时候,
03:08
What they don't realize实现 is that,
while they're watching观看 their shows节目,
58
176440
3216
实际上他们也正被观察着。
03:11
actually其实, they are being存在 watched看着.
59
179680
2296
他们被Roy Price及他的团队观察,
03:14
They are being存在 watched看着
by Roy罗伊 Price价钱 and his team球队,
60
182000
2336
他们纪录了所有的一切。
03:16
who record记录 everything.
61
184360
1376
他们纪录了哪些人按了拨放,
哪些人按了暂停,
03:17
They record记录 when somebody presses印刷机 play,
when somebody presses印刷机 pause暂停,
62
185760
3376
哪些部分他们跳过了,
哪些部分他们又重看了一遍。
03:21
what parts部分 they skip跳跃,
what parts部分 they watch again.
63
189160
2536
他们收集了几百万个数据,
03:23
So they collect搜集 millions百万 of data数据 points,
64
191720
2256
因为他们想要用这些数据来决定
03:26
because they want
to have those data数据 points
65
194000
2096
要做什么样的节目。
03:28
to then decide决定
which哪一个 show显示 they should make.
66
196120
2696
03:30
And sure enough足够,
so they collect搜集 all the data数据,
67
198840
2176
理所当然,他们收集了所有的数据,
处理过后得到了一个答案,
03:33
they do all the data数据 crunching捣弄,
and an answer回答 emerges出现,
68
201040
2576
而答案就是,
03:35
and the answer回答 is,
69
203640
1216
“亚马逊需要制作一个有关
四个美国共和党参议员的喜剧”。
03:36
"Amazon亚马逊 should do a sitcom情景喜剧
about four Republican共和党人 US Senators参议员."
70
204880
5536
他们真的做了。
03:42
They did that show显示.
71
210440
1216
有人知道这个节目吗?
03:43
So does anyone任何人 know the name名称 of the show显示?
72
211680
2160
(观众:" 阿尔法屋。")
03:46
(Audience听众: "AlphaΑ House.")
73
214720
1296
是的,就是"阿尔法屋"。
03:48
Yes, "AlphaΑ House,"
74
216040
1456
03:49
but it seems似乎 like not too many许多 of you here
remember记得 that show显示, actually其实,
75
217520
4096
但看起来你们大部人都
不记得有这部片子,
因为这部片子收视率并不太好。
03:53
because it didn't turn out that great.
76
221640
1856
它其实只是个一般的节目,
03:55
It's actually其实 just an average平均 show显示,
77
223520
1856
03:57
actually其实 -- literally按照字面, in fact事实, because
the average平均 of this curve曲线 here is at 7.4,
78
225400
4576
实际上,一般的节目差不多
对应曲线上大概7.4分的位置,
而 “阿尔法屋” 落在7.5分,
04:02
and "AlphaΑ House" lands土地 at 7.5,
79
230000
2416
所以比一般的节目高一点点,
04:04
so a slightly above以上 average平均 show显示,
80
232440
2016
但绝对不是Roy Price和
他的团队想要达到的目标。
04:06
but certainly当然 not what Roy罗伊 Price价钱
and his team球队 were aiming瞄准 for.
81
234480
2920
但在差不多同一时间,
04:10
Meanwhile与此同时, however然而,
at about the same相同 time,
82
238320
2856
另一家公司的另一个决策者,
04:13
at another另一个 company公司,
83
241200
1576
同样用数据分析
却做出了一个顶尖的节目,
04:14
another另一个 executive行政人员 did manage管理
to land土地 a top最佳 show显示 using运用 data数据 analysis分析,
84
242800
4216
他的名字是 Ted,
04:19
and his name名称 is Ted摊晒,
85
247040
1576
Ted Sarandos是Netflix的
首席内容官,
04:20
Ted摊晒 SarandosSarandos, who is
the Chief首席 Content内容 Officer of NetflixNetflix公司,
86
248640
3416
就跟 Roy一样,他也要不停地寻找
04:24
and just like Roy罗伊,
he's on a constant不变 mission任务
87
252080
2136
最棒的节目,
04:26
to find that great TV电视 show显示,
88
254240
1496
而他也使用了数据分析,
04:27
and he uses使用 data数据 as well to do that,
89
255760
2016
但他的做法有点不太一样。
04:29
except he does it
a little bit differently不同.
90
257800
2015
不是举办竞赛,他和他的团队
04:31
So instead代替 of holding保持 a competition竞争,
what he did -- and his team球队 of course课程 --
91
259839
3737
04:35
was they looked看着 at all the data数据
they already已经 had about NetflixNetflix公司 viewers观众,
92
263600
3536
观察了Netflix已有的所有观众数据,
比如观众对节目的评分、
04:39
you know, the ratings评级
they give their shows节目,
93
267160
2096
观看纪录、
哪些节目最受欢迎等等。
04:41
the viewing观看 histories历史,
what shows节目 people like, and so on.
94
269280
2696
他们用这些数据去挖掘
04:44
And then they use that data数据 to discover发现
95
272000
1896
观众的所有小细节:
04:45
all of these little bits and pieces
about the audience听众:
96
273920
2616
他们喜欢什么类型的节目、
04:48
what kinds of shows节目 they like,
97
276560
1456
什么类型的制作人、
什么类型的演员。
04:50
what kind of producers生产商,
what kind of actors演员.
98
278040
2096
就在他们收集到全部的细节后,
04:52
And once一旦 they had
all of these pieces together一起,
99
280160
2576
他们信心满满地
04:54
they took a leap飞跃 of faith信仰,
100
282760
1656
决定要制作一部,
04:56
and they decided决定 to license执照
101
284440
2096
不是四个参议员的喜剧,
04:58
not a sitcom情景喜剧 about four Senators参议员
102
286560
2456
而是一系列有关一位
单身参议员的电视剧。
05:01
but a drama戏剧 series系列 about a single Senator参议员.
103
289040
2880
各位知道那个节目吗?
05:04
You guys know the show显示?
104
292760
1656
(笑声)
05:06
(Laughter笑声)
105
294440
1296
是的,“纸牌屋”,
当然,Netflix至少在头两季
05:07
Yes, "House of Cards," and NetflixNetflix公司
of course课程, nailed it with that show显示,
106
295760
3736
在这个节目上赚到了极高的收视率。
05:11
at least最小 for the first two seasons季节.
107
299520
2136
05:13
(Laughter笑声) (Applause掌声)
108
301680
3976
(笑声)(掌声)
“纸牌屋” 在这个曲线上拿到了 9.1分,
05:17
"House of Cards" gets得到
a 9.1 rating评分 on this curve曲线,
109
305680
3176
他们绝对实现了最初的目标。
05:20
so it's exactly究竟
where they wanted it to be.
110
308880
3176
很显然,问题来了,
这到底是怎么回事?
05:24
Now, the question of course课程 is,
what happened发生 here?
111
312080
2416
有两个非常有竞争力、
精通数据分析的公司。
05:26
So you have two very competitive竞争的,
data-savvy数据精明 companies公司.
112
314520
2656
05:29
They connect all of these
millions百万 of data数据 points,
113
317200
2856
他们整合了所有的数据,
然后,其中一个干的很漂亮,
05:32
and then it works作品
beautifully精美 for one of them,
114
320080
2376
而另一个却没有,
05:34
and it doesn't work for the other one.
115
322480
1856
这是为什么呢?
05:36
So why?
116
324360
1216
毕竟逻辑分析会告诉你,
这种方法应该每次都有效啊,
05:37
Because logic逻辑 kind of tells告诉 you
that this should be working加工 all the time.
117
325600
3456
我是说,
如果你收集了所有的数据
05:41
I mean, if you're collecting搜集
millions百万 of data数据 points
118
329080
2456
来制定一个决策,
05:43
on a decision决定 you're going to make,
119
331560
1736
那你应该可以得到一个
相当不错的决策。
05:45
then you should be able能够
to make a pretty漂亮 good decision决定.
120
333320
2616
你有200年的统计方法做后盾。
05:47
You have 200 years年份
of statistics统计 to rely依靠 on.
121
335960
2216
你用高性能的计算机
去增强它的效果。
05:50
You're amplifying放大 it
with very powerful强大 computers电脑.
122
338200
3016
至少你可以期待得到一个
还不错的电视节目,对吧?
05:53
The least最小 you could expect期望
is good TV电视, right?
123
341240
3280
但如果数据分析
并没有想像中的有效,
05:57
And if data数据 analysis分析
does not work that way,
124
345880
2720
这就有点吓人了,
06:01
then it actually其实 gets得到 a little scary害怕,
125
349520
2056
因为我们生活在一个
越来越依赖数据的时代,
06:03
because we live生活 in a time
where we're turning车削 to data数据 more and more
126
351600
3816
06:07
to make very serious严重 decisions决定
that go far beyond TV电视.
127
355440
4480
我们要用数据做出远比电视节目
还要严肃重要的决策。
你们当中有人知道 "MHS" 这家公司吗?
06:12
Does anyone任何人 here know the company公司
Multi-Health多生 Systems系统?
128
360760
3240
没人?好,这就好。
06:17
No one. OK, that's good actually其实.
129
365080
1656
好的,MHS是一家软件公司,
06:18
OK, so Multi-Health多生 Systems系统
is a software软件 company公司,
130
366760
3216
而我希望在座的各位
06:22
and I hope希望 that nobody没有人 here in this room房间
131
370000
2816
没人与他们的软件有任何关系,
06:24
ever comes into contact联系
with that software软件,
132
372840
3176
因为如果你有,
就表示你犯了罪被判刑了。
06:28
because if you do,
it means手段 you're in prison监狱.
133
376040
2096
(笑声)
06:30
(Laughter笑声)
134
378160
1176
如果有人在美国被判入狱,
要申请假释,
06:31
If someone有人 here in the US is in prison监狱,
and they apply应用 for parole言语,
135
379360
3536
06:34
then it's very likely容易 that
data数据 analysis分析 software软件 from that company公司
136
382920
4296
很有可能那家公司的数据分析软件
就会被用来判定你是否能获得假释。
06:39
will be used in determining决定
whether是否 to grant发放 that parole言语.
137
387240
3616
它也是采用跟
亚马逊和Netflix公司相同的原则,
06:42
So it's the same相同 principle原理
as Amazon亚马逊 and NetflixNetflix公司,
138
390880
2576
但并不是要决定
某个电视节目收视率的好坏,
06:45
but now instead代替 of deciding决定 whether是否
a TV电视 show显示 is going to be good or bad,
139
393480
4616
而是用来决定
一个人将来的行为是好是坏。
06:50
you're deciding决定 whether是否 a person
is going to be good or bad.
140
398120
2896
06:53
And mediocre平庸 TV电视, 22 minutes分钟,
that can be pretty漂亮 bad,
141
401040
5496
一个22分钟的普通电视节目
可以很糟糕,
但我觉得要坐很多年的牢,更糟糕。
06:58
but more years年份 in prison监狱,
I guess猜测, even worse更差.
142
406560
2640
但不幸的是,实际上已经有证据显示,
这项数据分析尽管可以依靠
07:02
And unfortunately不幸, there is actually其实
some evidence证据 that this data数据 analysis分析,
143
410360
4136
07:06
despite尽管 having lots of data数据,
does not always produce生产 optimum最佳 results结果.
144
414520
4216
庞大的数据资料,
它并不总能得出最优的结果。
但并不只有像MHS这样的软件公司
07:10
And that's not because a company公司
like Multi-Health多生 Systems系统
145
418760
2722
不确定到底怎么分析数据,
07:13
doesn't know what to do with data数据.
146
421506
1627
就连最顶尖的数据公司也会出错。
07:15
Even the most data-savvy数据精明
companies公司 get it wrong错误.
147
423158
2298
是的,甚至谷歌有时也会出错。
07:17
Yes, even Google谷歌 gets得到 it wrong错误 sometimes有时.
148
425480
2400
2009年,谷歌宣布
他们可以用数据分析来
07:20
In 2009, Google谷歌 announced公布
that they were able能够, with data数据 analysis分析,
149
428680
4496
预测流行性感冒何时爆发,
就是那种讨人厌的流感,
07:25
to predict预测 outbreaks爆发 of influenza流感,
the nasty讨厌 kind of flu流感,
150
433200
4136
他们用自己的搜寻引擎
来做数据分析。
07:29
by doing data数据 analysis分析
on their Google谷歌 searches搜索.
151
437360
3776
结果证明它准确无比,
引得各路新闻报道铺天盖地,
07:33
And it worked工作 beautifully精美,
and it made制作 a big splash in the news新闻,
152
441160
3856
甚至还达到了一个科学界的顶峰:
07:37
including包含 the pinnacle巅峰
of scientific科学 success成功:
153
445040
2136
在 “自然” 期刊上发表了文章。
07:39
a publication出版物 in the journal日志 "Nature性质."
154
447200
2456
之后的每一年,它都预测得准确无误,
07:41
It worked工作 beautifully精美
for year after year after year,
155
449680
3616
直到有一年,它失败了。
07:45
until直到 one year it failed失败.
156
453320
1656
没有人知道到底是什么原因。
07:47
And nobody没有人 could even tell exactly究竟 why.
157
455000
2256
那一年它就是不准了,
07:49
It just didn't work that year,
158
457280
1696
当然,这又成了一个大新闻,
07:51
and of course课程 that again made制作 big news新闻,
159
459000
1936
包括现在
07:52
including包含 now a retraction回缩
160
460960
1616
被 "自然” 期刊撤稿。
07:54
of a publication出版物
from the journal日志 "Nature性质."
161
462600
2840
所以,即使是最顶尖的数据分析公司,
亚马逊和谷歌,
07:58
So even the most data-savvy数据精明 companies公司,
Amazon亚马逊 and Google谷歌,
162
466480
3336
他们有时也会出错。
08:01
they sometimes有时 get it wrong错误.
163
469840
2136
但尽管出现了这些失败,
08:04
And despite尽管 all those failures故障,
164
472000
2936
数据仍然在马不停蹄地渗透进我们
实际生活中的决策——
08:06
data数据 is moving移动 rapidly急速
into real-life现实生活 decision-making做决定 --
165
474960
3856
进入工作场所、
08:10
into the workplace职场,
166
478840
1816
执法过程、
08:12
law enforcement强制,
167
480680
1816
医药领域。
08:14
medicine医学.
168
482520
1200
08:16
So we should better make sure
that data数据 is helping帮助.
169
484400
3336
所以,我们应该确保数据是
能够帮助我们解决问题的。
08:19
Now, personally亲自 I've seen看到
a lot of this struggle斗争 with data数据 myself,
170
487760
3136
我个人也曾经多次
被数据分析搞的焦头烂额,
因为我在计算遗传学领域工作,
08:22
because I work in computational计算 genetics遗传学,
171
490920
1976
这个领域有很多非常聪明的人
08:24
which哪一个 is also a field领域
where lots of very smart聪明 people
172
492920
2496
在用多到难以想像的数据
来制定相当严肃的决策,
08:27
are using运用 unimaginable不可思议 amounts of data数据
to make pretty漂亮 serious严重 decisions决定
173
495440
3656
08:31
like deciding决定 on a cancer癌症 therapy治疗
or developing发展 a drug药物.
174
499120
3560
比如癌症治疗,或者药物开发。
经过这几年,我已经注意到一种模式
08:35
And over the years年份,
I've noticed注意到 a sort分类 of pattern模式
175
503520
2376
08:37
or kind of rule规则, if you will,
about the difference区别
176
505920
2456
或者规则,你们也可以这么理解,
08:40
between之间 successful成功
decision-making做决定 with data数据
177
508400
2696
就是有关于用数据做出
成功决策和不成功决策,
08:43
and unsuccessful不成功 decision-making做决定,
178
511120
1616
我觉得这个模式值得分享,
大概是这样的。
08:44
and I find this a pattern模式 worth价值 sharing分享,
and it goes something like this.
179
512760
3880
当你要解决一个复杂问题时,
08:50
So whenever每当 you're
solving a complex复杂 problem问题,
180
518520
2135
你通常必然会做两件事。
08:52
you're doing essentially实质上 two things.
181
520679
1737
首先,你会把问题拆分得非常细,
08:54
The first one is, you take that problem问题
apart距离 into its bits and pieces
182
522440
3296
这样你就可以深度地分析这些细节,
08:57
so that you can deeply analyze分析
those bits and pieces,
183
525760
2496
09:00
and then of course课程
you do the second第二 part部分.
184
528280
2016
当然你要做的第二件事就是,
再把这些细节重新整合在一起,
09:02
You put all of these bits and pieces
back together一起 again
185
530320
2656
来得出你要的结论。
09:05
to come to your conclusion结论.
186
533000
1336
有时候你必须重复几次,
09:06
And sometimes有时 you
have to do it over again,
187
534360
2336
但基本都是围绕这两件事:
09:08
but it's always those two things:
188
536720
1656
拆分、再整合。
09:10
taking服用 apart距离 and putting
back together一起 again.
189
538400
2320
那么关键的问题在于,
09:14
And now the crucial关键 thing is
190
542280
1616
数据和数据分析
09:15
that data数据 and data数据 analysis分析
191
543920
2896
只适用于第一步,
09:18
is only good for the first part部分.
192
546840
2496
无论数据和数据分析多么强大,
09:21
Data数据 and data数据 analysis分析,
no matter how powerful强大,
193
549360
2216
它都只能帮助你拆分问题和了解细节,
09:23
can only help you taking服用 a problem问题 apart距离
and understanding理解 its pieces.
194
551600
4456
它不适用于把细节重新整合在一起
09:28
It's not suited合适的 to put those pieces
back together一起 again
195
556080
3496
来得出一个结论。
09:31
and then to come to a conclusion结论.
196
559600
1896
有一个工具可以实现第二步,
我们每个人都有,
09:33
There's another另一个 tool工具 that can do that,
and we all have it,
197
561520
2736
那就是大脑。
09:36
and that tool工具 is the brain.
198
564280
1296
如果要说大脑很擅长某一件事,
09:37
If there's one thing a brain is good at,
199
565600
1936
那就是,它很会把琐碎的细节
重新整合在一起,
09:39
it's taking服用 bits and pieces
back together一起 again,
200
567560
2256
即使你拥有的信息并不完整,
09:41
even when you have incomplete残缺 information信息,
201
569840
2016
也能得到一个好的结论,
09:43
and coming未来 to a good conclusion结论,
202
571880
1576
特别是专家的大脑。
09:45
especially特别 if it's the brain of an expert专家.
203
573480
2936
而这也是我相信
Netflix会这么成功的原因,
09:48
And that's why I believe
that NetflixNetflix公司 was so successful成功,
204
576440
2656
因为他们在分析过程中同时
使用了数据和大脑。
09:51
because they used data数据 and brains大脑
where they belong属于 in the process处理.
205
579120
3576
他们利用数据,
首先去了解观众的若干细节,
09:54
They use data数据 to first understand理解
lots of pieces about their audience听众
206
582720
3536
没有这些数据,
他们不可能进行这么透彻的分析,
09:58
that they otherwise除此以外 wouldn't不会 have
been able能够 to understand理解 at that depth深度,
207
586280
3416
但在之后,要做出重新整合,
10:01
but then the decision决定
to take all these bits and pieces
208
589720
2616
制作像"纸牌屋"这样的节目的决策,
10:04
and put them back together一起 again
and make a show显示 like "House of Cards,"
209
592360
3336
就无法依赖数据了。
10:07
that was nowhere无处 in the data数据.
210
595720
1416
是Ted Sarandos和他的团队(通过思考)
做出了批准该节目的这个决策,
10:09
Ted摊晒 SarandosSarandos and his team球队
made制作 that decision决定 to license执照 that show显示,
211
597160
3976
这也就意味着,
10:13
which哪一个 also meant意味着, by the way,
that they were taking服用
212
601160
2381
他们在做出决策的当下,
也正在承担很大的个人风险。
10:15
a pretty漂亮 big personal个人 risk风险
with that decision决定.
213
603565
2851
10:18
And Amazon亚马逊, on the other hand,
they did it the wrong错误 way around.
214
606440
3016
而另一方面,亚马逊把事情搞砸了。
他们全程依赖数据来制定决策,
10:21
They used data数据 all the way
to drive驾驶 their decision-making做决定,
215
609480
2736
首先,举办了关于节目创意的竞赛,
10:24
first when they held保持
their competition竞争 of TV电视 ideas思路,
216
612240
2416
然后他们决定选择制作 "阿尔法屋"。
10:26
then when they selected "AlphaΑ House"
to make as a show显示.
217
614680
3696
当然,对他们而言,
这是一个非常安全的决策,
10:30
Which哪一个 of course课程 was
a very safe安全 decision决定 for them,
218
618400
2496
10:32
because they could always
point at the data数据, saying,
219
620920
2456
因为他们总是可以指着数据说,
“这是数据告诉我们的。”
10:35
"This is what the data数据 tells告诉 us."
220
623400
1696
但数据并没有带给他们
满意的结果。
10:37
But it didn't lead to the exceptional优秀
results结果 that they were hoping希望 for.
221
625120
4240
当然,数据依然是做决策时的
一个强大的工具,
10:42
So data数据 is of course课程 a massively大规模
useful有用 tool工具 to make better decisions决定,
222
630120
4976
但我相信,当数据开始
主导这些决策时,
10:47
but I believe that things go wrong错误
223
635120
2376
10:49
when data数据 is starting开始
to drive驾驶 those decisions决定.
224
637520
2576
并不能保证万无一失。
不管它有多么的强大,
数据都仅仅是一个工具,
10:52
No matter how powerful强大,
data数据 is just a tool工具,
225
640120
3776
记住这句话之后,
我发现这个装置相当有用。
10:55
and to keep that in mind心神,
I find this device设备 here quite相当 useful有用.
226
643920
3336
10:59
Many许多 of you will ...
227
647280
1216
你们很多人就会......
11:00
(Laughter笑声)
228
648520
1216
(笑声)
在有数据之前,
11:01
Before there was data数据,
229
649760
1216
这就是用来做决策的工具。
11:03
this was the decision-making做决定
device设备 to use.
230
651000
2856
(笑声)
11:05
(Laughter笑声)
231
653880
1256
你们很多人应该知道这个玩意儿。
11:07
Many许多 of you will know this.
232
655160
1336
这个玩具称做“魔术8号球”,
11:08
This toy玩具 here is called the Magic魔法 8 Ball,
233
656520
1953
它真的很奇妙,
11:10
and it's really amazing惊人,
234
658497
1199
因为如果你要做一个
“是” 或 “不是” 的决策时,
11:11
because if you have a decision决定 to make,
a yes or no question,
235
659720
2896
你只要摇一摇这颗球,
就可以得到答案了——
11:14
all you have to do is you shake the ball,
and then you get an answer回答 --
236
662640
3736
“很有可能是”——
在这个视窗里,马上就可以看到。
11:18
"Most Likely容易" -- right here
in this window窗口 in real真实 time.
237
666400
2816
我回头会带它去做技术示范。
11:21
I'll have it out later后来 for tech高科技 demos演示.
238
669240
2096
(笑声)
11:23
(Laughter笑声)
239
671360
1216
事实上,当然——
我已经在我人生中做出了一些决定,
11:24
Now, the thing is, of course课程 --
so I've made制作 some decisions决定 in my life
240
672600
3576
虽然事后证明,
我当初应该直接用这颗球。
11:28
where, in hindsight事后,
I should have just listened听了 to the ball.
241
676200
2896
但,当然,如果你手里有数据,
11:31
But, you know, of course课程,
if you have the data数据 available可得到,
242
679120
3336
你就会想用更尖端的方式
来取代这颗球,
11:34
you want to replace更换 this with something
much more sophisticated复杂的,
243
682480
3056
比方说,用数据分析来得到更好的决策。
11:37
like data数据 analysis分析
to come to a better decision决定.
244
685560
3616
但这无法改变基本的设定。
11:41
But that does not change更改 the basic基本 setup建立.
245
689200
2616
这球可能会变得越来越智能,
11:43
So the ball may可能 get smarter聪明
and smarter聪明 and smarter聪明,
246
691840
3176
但我相信,如果我们想达成某些
像曲线最右端那样
11:47
but I believe it's still on us
to make the decisions决定
247
695040
2816
11:49
if we want to achieve实现
something extraordinary非凡,
248
697880
3016
出色的成就,最后的决定权
还是应该落在我们身上。
11:52
on the right end结束 of the curve曲线.
249
700920
1936
11:54
And I find that a very encouraging鼓舞人心的
message信息, in fact事实,
250
702880
4496
事实上,我还发现了
一件非常鼓舞人心的事,
即使面对庞大的数据,
11:59
that even in the face面对
of huge巨大 amounts of data数据,
251
707400
3976
12:03
it still pays支付 off to make decisions决定,
252
711400
4096
当你要做出决定,
想要变成一位该领域的专家
并承担风险时,
12:07
to be an expert专家 in what you're doing
253
715520
2656
12:10
and take risks风险.
254
718200
2096
你仍然会有很大的收获。
因为到最后,不是数据,
12:12
Because in the end结束, it's not data数据,
255
720320
2776
而是风险,会把你引到曲线的最右端。
12:15
it's risks风险 that will land土地 you
on the right end结束 of the curve曲线.
256
723120
3960
12:19
Thank you.
257
727840
1216
谢谢各位。
12:21
(Applause掌声)
258
729080
3680
(掌声)
Translated by Chia Shimin

▲Back to top

ABOUT THE SPEAKER
Sebastian Wernicke - Data scientist
After making a splash in the field of bioinformatics, Sebastian Wernicke moved on to the corporate sphere, where he motivates and manages multidimensional projects.

Why you should listen

Dr. Sebastian Wernicke is the Chief Data Scientist of ONE LOGIC, a data science boutique that supports organizations across industries to make sense of their vast data collections to improve operations and gain strategic advantages. Wernicke originally studied bioinformatics and previously led the strategy and growth of Seven Bridges Genomics, a Cambridge-based startup that builds platforms for genetic analysis.

Before his career in statistics began, Wernicke worked stints as both a paramedic and successful short animated filmmaker. He's also the author of the TEDPad app, an irreverent tool for creating an infinite number of "amazing and really bad" and mostly completely meaningless talks. He's the author of the statistically authoritative and yet completely ridiculous "How to Give the Perfect TEDTalk."

More profile about the speaker
Sebastian Wernicke | Speaker | TED.com