TEDSalon Berlin 2014
Kenneth Cukier: Big data is better data
肯尼斯.庫基耶: 「大」數據是「好」數據
Filmed:
Readability: 4.1
1,663,038 views
自動駕駛汽車只是開端。大數據驅動的科技與設計,將帶給我們什麼樣的未來?在這個令人振奮的科學演講中,肯尼斯.庫基耶探討機器學習以及人類知識的明天。
Kenneth Cukier - Data Editor of The Economist
Kenneth Cukier is the Data Editor of The Economist. From 2007 to 2012 he was the Tokyo correspondent, and before that, the paper’s technology correspondent in London, where his work focused on innovation, intellectual property and Internet governance. Kenneth is also the co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think with Viktor Mayer-Schönberger in 2013, which was a New York Times Bestseller and translated into 16 languages. Full bio
Kenneth Cukier is the Data Editor of The Economist. From 2007 to 2012 he was the Tokyo correspondent, and before that, the paper’s technology correspondent in London, where his work focused on innovation, intellectual property and Internet governance. Kenneth is also the co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think with Viktor Mayer-Schönberger in 2013, which was a New York Times Bestseller and translated into 16 languages. Full bio
Double-click the English transcript below to play the video.
00:12
America's favorite pie is?
0
787
3845
美國人最喜歡哪一種派?
00:16
Audience: Apple.
Kenneth Cukier: Apple. Of course it is.
Kenneth Cukier: Apple. Of course it is.
1
4632
3506
觀眾:蘋果派。
講者:蘋果派,當然啦!
我們怎麼知道?
00:20
How do we know it?
2
8138
1231
00:21
Because of data.
3
9369
2753
因為有數據。
00:24
You look at supermarket sales.
4
12122
2066
我們分析超市銷售數據,
00:26
You look at supermarket
sales of 30-centimeter pies
sales of 30-centimeter pies
5
14188
2866
分析直徑 30 公分冷凍蘋果派的
超市銷售數據,
超市銷售數據,
00:29
that are frozen, and apple wins, no contest.
6
17054
4075
蘋果派最夯,銷量一面倒。
顧客幾乎都是買蘋果派。
00:33
The majority of the sales are apple.
7
21129
5180
00:38
But then supermarkets started selling
8
26309
2964
但是後來,超市開始賣小派,
00:41
smaller, 11-centimeter pies,
9
29273
2583
直徑 11 公分的派,
00:43
and suddenly, apple fell to fourth or fifth place.
10
31856
4174
突然,蘋果派銷量掉到第四、五名,
00:48
Why? What happened?
11
36030
2875
為什麼?發生了什麼事?
00:50
Okay, think about it.
12
38905
2818
好,你想想:
00:53
When you buy a 30-centimeter pie,
13
41723
3848
如果是買 30 公分的大派,
00:57
the whole family has to agree,
14
45571
2261
全家人都得同意,
00:59
and apple is everyone's second favorite.
15
47832
3791
而蘋果是全家每個人的第二選擇,
01:03
(Laughter)
16
51623
1935
(觀眾笑聲)
01:05
But when you buy an individual 11-centimeter pie,
17
53558
3615
但是當你分開買 11 公分的小派,
01:09
you can buy the one that you want.
18
57173
3745
就可以買你自己想吃的,
01:12
You can get your first choice.
19
60918
4015
每個人都可以選自己最愛的口味。
01:16
You have more data.
20
64933
1641
這就會產生更多的數據。
01:18
You can see something
21
66574
1554
你會有新發現,
01:20
that you couldn't see
22
68128
1132
看出數據少的時候,
無法發現的現象。
無法發現的現象。
01:21
when you only had smaller amounts of it.
23
69260
3953
01:25
Now, the point here is that more data
24
73213
2475
現在,這個例子的重點是,
數據增加,不只是讓我們看見更「多」,
01:27
doesn't just let us see more,
25
75688
2283
01:29
more of the same thing we were looking at.
26
77971
1854
更多我們本來就已經知道的;
01:31
More data allows us to see new.
27
79825
3613
數據增加,讓我們看見「新」資訊,
01:35
It allows us to see better.
28
83438
3094
看得更「準確」,
01:38
It allows us to see different.
29
86532
3656
看見「不同」。
01:42
In this case, it allows us to see
30
90188
3173
在這個例子,它使我們看到
01:45
what America's favorite pie is:
31
93361
2913
美國人真正最喜歡的派是什麼:
01:48
not apple.
32
96274
2542
不是蘋果派。
01:50
Now, you probably all have heard the term big data.
33
98816
3614
你們可能都聽過「大數據」這個詞,
01:54
In fact, you're probably sick of hearing the term
34
102430
2057
其實,你們可能已經聽膩了。
01:56
big data.
35
104487
1630
01:58
It is true that there is a lot of hype around the term,
36
106117
3330
的確有很多大肆宣傳,
02:01
and that is very unfortunate,
37
109447
2332
非常遺憾。
02:03
because big data is an extremely important tool
38
111779
3046
因為大數據是極為重要的工具,
02:06
by which society is going to advance.
39
114825
3734
將會推動社會進步。
02:10
In the past, we used to look at small data
40
118559
3561
過去,我們依賴少量數據,
02:14
and think about what it would mean
41
122120
1704
研究其含義,
02:15
to try to understand the world,
42
123824
1496
試圖了解我們的世界。
02:17
and now we have a lot more of it,
43
125320
1991
現在我們有了更多數據,
02:19
more than we ever could before.
44
127311
2722
遠超過以往能力所及。
02:22
What we find is that when we have
45
130033
1877
我們發現,
當我們擁有龐大的數據,
02:23
a large body of data, we can fundamentally do things
46
131910
2724
就可以做過去數據較少時做不到的事。
02:26
that we couldn't do when we
only had smaller amounts.
only had smaller amounts.
47
134634
3276
02:29
Big data is important, and big data is new,
48
137910
2641
大數據很重要,
大數據也很新。
02:32
and when you think about it,
49
140551
1777
你想一想,
02:34
the only way this planet is going to deal
50
142328
2216
唯一能幫助地球因應全球的挑戰:
02:36
with its global challenges —
51
144544
1789
02:38
to feed people, supply them with medical care,
52
146333
3537
解決饑荒、
提供醫療、
提供能源和電力、
02:41
supply them with energy, electricity,
53
149870
2810
02:44
and to make sure they're not burnt to a crisp
54
152680
1789
確保我們不被全球暖化烤焦,
02:46
because of global warming —
55
154469
1238
02:47
is because of the effective use of data.
56
155707
4195
唯一的方法,就是靠善用數據。
02:51
So what is new about big
data? What is the big deal?
data? What is the big deal?
57
159902
3870
所以大數據有什麼稀奇?
有什麼好「大」驚小怪?
02:55
Well, to answer that question, let's think about
58
163772
2517
要回答這個問題,
讓我們先來看資訊以前長什麼樣子。
02:58
what information looked like,
59
166289
1896
03:00
physically looked like in the past.
60
168185
3034
03:03
In 1908, on the island of Crete,
61
171219
3611
好,
1908 年,在克里特島,
03:06
archaeologists discovered a clay disc.
62
174830
4735
考古學家發現一個泥土圓盤,
03:11
They dated it from 2000 B.C., so it's 4,000 years old.
63
179565
4059
鑑定大約是公元前 2 千年製成的,
所以已經有 4 千年之久。
03:15
Now, there's inscriptions on this disc,
64
183624
2004
圓盤上刻有古文字,
03:17
but we actually don't know what it means.
65
185628
1327
但無法解讀,
03:18
It's a complete mystery, but the point is that
66
186955
2098
是個謎團。
但重點是,4 千年前資訊是這個樣貌,
03:21
this is what information used to look like
67
189053
1928
03:22
4,000 years ago.
68
190981
2089
古人是用這種方式儲存、傳遞資訊。
03:25
This is how society stored
69
193070
2548
03:27
and transmitted information.
70
195618
3524
03:31
Now, society hasn't advanced all that much.
71
199142
4160
到現在,社會並沒有進步那麼多,
03:35
We still store information on discs,
72
203302
3474
我們還是把資訊存在碟片上,
03:38
but now we can store a lot more information,
73
206776
3184
只是現在可以儲存更多資訊,
03:41
more than ever before.
74
209960
1260
空前的多。
03:43
Searching it is easier. Copying it easier.
75
211220
3093
搜尋更容易,複製更容易,
03:46
Sharing it is easier. Processing it is easier.
76
214313
3500
分享更容易,處理更容易。
03:49
And what we can do is we can reuse this information
77
217813
2766
我們可以重複使用這些資訊,
03:52
for uses that we never even imagined
78
220579
1834
用途之廣,超乎想像,
03:54
when we first collected the data.
79
222413
3195
超乎我們蒐集資訊時的預期。
03:57
In this respect, the data has gone
80
225608
2252
這樣看來,資訊已經
03:59
from a stock to a flow,
81
227860
3532
從「存料」 變成「流動」;
04:03
from something that is stationary and static
82
231392
3938
從靜止、靜態的,
04:07
to something that is fluid and dynamic.
83
235330
3609
變成流體、動態的。
04:10
There is, if you will, a liquidity to information.
84
238939
4023
資訊可說是,有流動性。
04:14
The disc that was discovered off of Crete
85
242962
3474
那個 4 千年之久的克里特圓盤,
04:18
that's 4,000 years old, is heavy,
86
246436
3764
它很重,
04:22
it doesn't store a lot of information,
87
250200
1962
儲存的資訊量不多,
04:24
and that information is unchangeable.
88
252162
3116
內容也不能更改。
04:27
By contrast, all of the files
89
255278
4011
相較之下,
愛德華.史諾登盜走的所有檔案,
04:31
that Edward Snowden took
90
259289
1861
04:33
from the National Security
Agency in the United States
Agency in the United States
91
261150
2621
就是他從美國國安局竊走的資料,
04:35
fits on a memory stick
92
263771
2419
可以全部存在一個記憶卡,
04:38
the size of a fingernail,
93
266190
3010
體積只有指甲般的大小。
04:41
and it can be shared at the speed of light.
94
269200
4745
並且可以用光速來傳輸分享。
04:45
More data. More.
95
273945
5255
更多的數據!
更多。
今天之所以有這麼多的數據,
04:51
Now, one reason why we have
so much data in the world today
so much data in the world today
96
279200
1974
04:53
is we are collecting things
97
281174
1432
原因之一是
我們正在蒐集過去
我們正在蒐集過去
04:54
that we've always collected information on,
98
282606
3280
儲存資訊的物體;
04:57
but another reason why is we're taking things
99
285886
2656
原因之二是,
我們把一些經常很資訊性的東西——
05:00
that have always been informational
100
288542
2812
05:03
but have never been rendered into a data format
101
291354
2486
從未數據化的資訊,
05:05
and we are putting it into data.
102
293840
2419
把它們變成數據,
05:08
Think, for example, the question of location.
103
296259
3308
例如,地理位置。
05:11
Take, for example, Martin Luther.
104
299567
2249
舉馬丁.路德為例,
05:13
If we wanted to know in the 1500s
105
301816
1597
如果我們想知道十六世紀時,
05:15
where Martin Luther was,
106
303413
2667
馬丁.路德去過哪些地方,
05:18
we would have to follow him at all times,
107
306080
2092
我們必須隨時跟著他到處跑,
05:20
maybe with a feathery quill and an inkwell,
108
308172
2137
可能還要帶著羽毛筆和墨水瓶,
05:22
and record it,
109
310309
1676
隨時記錄。
05:23
but now think about what it looks like today.
110
311985
2183
但是看看現在的做法,
05:26
You know that somewhere,
111
314168
2122
你知道世界上某處,
05:28
probably in a telecommunications carrier's database,
112
316290
2446
可能是電信商的資料庫裡面,
05:30
there is a spreadsheet or at least a database entry
113
318736
3036
有一個試算表
或至少有一筆記錄,
或至少有一筆記錄,
05:33
that records your information
114
321772
2088
存著關於你的資訊,
05:35
of where you've been at all times.
115
323860
2063
記錄你去過的所有地方。
05:37
If you have a cell phone,
116
325923
1360
如果你有一支手機,
05:39
and that cell phone has GPS,
but even if it doesn't have GPS,
but even if it doesn't have GPS,
117
327283
2847
手機有 GPS,但就算沒有 GPS,
05:42
it can record your information.
118
330130
2385
還是可以記錄你的資訊。
05:44
In this respect, location has been datafied.
119
332515
4084
就這個角度來說,位置已經被數據化。
05:48
Now think, for example, of the issue of posture,
120
336599
4601
現在再想想這個例子:姿勢,
05:53
the way that you are all sitting right now,
121
341200
1285
就是你們現在的坐姿,
05:54
the way that you sit,
122
342485
2030
你的坐姿、
05:56
the way that you sit, the way that you sit.
123
344515
2771
你的坐姿,和你的坐姿,
05:59
It's all different, and it's a function of your leg length
124
347286
2077
都不一樣,取決於你的腿長、
06:01
and your back and the contours of your back,
125
349363
2093
你的背和背部輪廓。
06:03
and if I were to put sensors,
maybe 100 sensors
maybe 100 sensors
126
351456
2531
要是我現在裝 1 百個感應器,
06:05
into all of your chairs right now,
127
353987
1766
到你們每個人的椅子上,
06:07
I could create an index that's fairly unique to you,
128
355753
3600
我可以建出你個人獨特的索引資料,
06:11
sort of like a fingerprint, but it's not your finger.
129
359353
4409
有點像指紋,但不是你的手指。
06:15
So what could we do with this?
130
363762
2969
這有什麼用?
06:18
Researchers in Tokyo are using it
131
366731
2397
東京的研究員用這種數據
06:21
as a potential anti-theft device in cars.
132
369128
4388
來研發汽車防盜裝置。
06:25
The idea is that the carjacker sits behind the wheel,
133
373516
2924
概念是,偷車賊坐在駕駛座,
06:28
tries to stream off, but the car recognizes
134
376440
2104
急著開車逃逸,
但是車子辨識出開車的人未經授權,
06:30
that a non-approved driver is behind the wheel,
135
378544
2362
06:32
and maybe the engine just stops, unless you
136
380906
2164
引擎就自動熄火,
除非你輸入密碼到儀表板,
06:35
type in a password into the dashboard
137
383070
3177
告訴系統:「嘿,我可是有經授權喔!」
06:38
to say, "Hey, I have authorization to drive." Great.
138
386247
4658
很好。
06:42
What if every single car in Europe
139
390905
2553
若歐洲每輛汽車都有這個裝置呢?
06:45
had this technology in it?
140
393458
1457
那又能做什麼?
06:46
What could we do then?
141
394915
3165
06:50
Maybe, if we aggregated the data,
142
398080
2240
或許,我們可以聚集所有的數據,
06:52
maybe we could identify telltale signs
143
400320
3814
或許能提早偵測到警訊,
06:56
that best predict that a car accident
144
404134
2709
預測車禍
06:58
is going to take place in the next five seconds.
145
406843
5893
即將在 5 秒鐘內發生。
07:04
And then what we will have datafied
146
412736
2557
然後我們還可以數據化
07:07
is driver fatigue,
147
415293
1783
駕駛員的疲勞狀態,
07:09
and the service would be when the car senses
148
417076
2334
汽車系統可以偵測到
07:11
that the person slumps into that position,
149
419410
3437
駕駛癱坐成某個姿勢,
07:14
automatically knows, hey, set an internal alarm
150
422847
3994
自動感知,發出指令啟動響鈴,
07:18
that would vibrate the steering wheel, honk inside
151
426841
2025
導致方向盤震動,
07:20
to say, "Hey, wake up,
152
428866
1721
車內喇叭作響,大喊:「嘿,快醒來!
07:22
pay more attention to the road."
153
430587
1904
注意路況!」
07:24
These are the sorts of things we can do
154
432491
1853
這一類的事都可以做到,
07:26
when we datafy more aspects of our lives.
155
434344
2821
當我們把更多的生活層面數據化。
07:29
So what is the value of big data?
156
437165
3675
那麼,大數據究竟有什麼價值?
07:32
Well, think about it.
157
440840
2190
想想看,
07:35
You have more information.
158
443030
2412
現在有更多資訊,
07:37
You can do things that you couldn't do before.
159
445442
3341
可以做過去不能做的事。
07:40
One of the most impressive areas
160
448783
1676
這概念的應用當中,最驚人的領域之一,
07:42
where this concept is taking place
161
450459
1729
07:44
is in the area of machine learning.
162
452188
3307
就是「機器學習」。
07:47
Machine learning is a branch of artificial intelligence,
163
455495
3077
機器學習是人工智慧的一個分支,
07:50
which itself is a branch of computer science.
164
458572
3378
人工智慧又是電腦科學的分支。
07:53
The general idea is that instead of
165
461950
1543
基本概念是:
07:55
instructing a computer what do do,
166
463493
2117
不必告訴電腦要做什麼,
07:57
we are going to simply throw data at the problem
167
465610
2620
只要把數據輸入到問題裡,
08:00
and tell the computer to figure it out for itself.
168
468230
3206
然後叫電腦自己想辦法。
08:03
And it will help you understand it
169
471436
1777
我們回顧一下源頭,
就會比較容易了解。
就會比較容易了解。
08:05
by seeing its origins.
170
473213
3552
08:08
In the 1950s, a computer scientist
171
476765
2388
1950 年代,IBM 有位電腦科學家
08:11
at IBM named Arthur Samuel liked to play checkers,
172
479153
3592
名叫亞瑟.山姆爾,很愛下跳棋,
08:14
so he wrote a computer program
173
482745
1402
所以他寫了一個電腦程式,
08:16
so he could play against the computer.
174
484147
2813
叫電腦跟他對打。
08:18
He played. He won.
175
486960
2711
他開始下棋,結果他贏了。
08:21
He played. He won.
176
489671
2103
他再開始下棋,結果他又贏了。
08:23
He played. He won,
177
491774
3015
他再下,還是他贏。
08:26
because the computer only knew
178
494789
1778
因為電腦只會
08:28
what a legal move was.
179
496567
2227
棋步的規則。
08:30
Arthur Samuel knew something else.
180
498794
2087
而亞瑟.山姆爾會得更多,
08:32
Arthur Samuel knew strategy.
181
500881
4629
他懂得策略。
08:37
So he wrote a small sub-program alongside it
182
505510
2396
所以他又寫了一個副程式,
08:39
operating in the background, and all it did
183
507906
1974
在背景執行,只做一件事:
08:41
was score the probability
184
509880
1817
就是計算機率,
08:43
that a given board configuration would likely lead
185
511697
2563
評估目前的棋局,
08:46
to a winning board versus a losing board
186
514260
2910
比較贏棋和輸棋的機率,
08:49
after every move.
187
517170
2508
每下一步棋,就重算一次。
08:51
He plays the computer. He wins.
188
519678
3150
然後他又跟電腦對打,結果他贏。
08:54
He plays the computer. He wins.
189
522828
2508
再對打,還是他贏。
08:57
He plays the computer. He wins.
190
525336
3731
再對打,還是他贏。
09:01
And then Arthur Samuel leaves the computer
191
529067
2277
然後亞瑟.山姆爾讓電腦自己對打。
09:03
to play itself.
192
531344
2227
09:05
It plays itself. It collects more data.
193
533571
3509
它就自己下棋,一邊收集數據。
09:09
It collects more data. It increases
the accuracy of its prediction.
the accuracy of its prediction.
194
537080
4309
越收集越多,它的預測準確度就提高。
09:13
And then Arthur Samuel goes back to the computer
195
541389
2104
然後亞瑟.山姆爾再回來跟電腦對打。
09:15
and he plays it, and he loses,
196
543493
2318
他開始下棋,結果他輸了。
09:17
and he plays it, and he loses,
197
545811
2069
他又下,又輸了。
09:19
and he plays it, and he loses,
198
547880
2047
再下,還是輸。
09:21
and Arthur Samuel has created a machine
199
549927
2599
亞瑟.山姆爾創造了一台機器,
09:24
that surpasses his ability in a task that he taught it.
200
552526
6288
它的能力青出於藍,更甚於藍。
09:30
And this idea of machine learning
201
558814
2498
而這種機器學習的概念,
09:33
is going everywhere.
202
561312
3927
現在到處可見。
09:37
How do you think we have self-driving cars?
203
565239
3149
你想我們怎麼會有自動駕駛汽車?
09:40
Are we any better off as a society
204
568388
2137
把全部交通規則都輸入到軟體,
可以改善社會嗎?
可以改善社會嗎?
09:42
enshrining all the rules of the road into software?
205
570525
3285
09:45
No. Memory is cheaper. No.
206
573810
2598
不是。
因為記憶體更便宜嗎?不是。
09:48
Algorithms are faster. No. Processors are better. No.
207
576408
3994
演算法變快了?不。
有更好的處理器?不。
09:52
All of those things matter, but that's not why.
208
580402
2772
這些都很重要,但不是真正的原因。
09:55
It's because we changed the nature of the problem.
209
583174
3141
真正的原因是
我們改變了問題的本質。
我們改變了問題的本質。
09:58
We changed the nature of the problem from one
210
586315
1530
我們把問題從
09:59
in which we tried to overtly and explicitly
211
587845
2245
明確指示電腦如何開車,
10:02
explain to the computer how to drive
212
590090
2581
10:04
to one in which we say,
213
592671
1316
改成對電腦說:
10:05
"Here's a lot of data around the vehicle.
214
593987
1876
「我給你大量的開車數據,
10:07
You figure it out.
215
595863
1533
你自個兒看著辦吧!」
10:09
You figure it out that that is a traffic light,
216
597396
1867
你自己判斷出那是紅綠燈,
10:11
that that traffic light is red and not green,
217
599263
2081
而且現在亮紅燈,不是綠燈,
10:13
that that means that you need to stop
218
601344
2014
表示你要停車,
10:15
and not go forward."
219
603358
3083
不能繼續開。」
10:18
Machine learning is at the basis
220
606441
1518
機器學習也是
10:19
of many of the things that we do online:
221
607959
1991
我們許多網路活動的基礎:
10:21
search engines,
222
609950
1857
搜尋引擎、
10:23
Amazon's personalization algorithm,
223
611807
3801
亞馬遜的個人化演算法、
10:27
computer translation,
224
615608
2212
電腦翻譯、
10:29
voice recognition systems.
225
617820
4290
語音辨識系統。
10:34
Researchers recently have looked at
226
622110
2835
研究專家近來研究
10:36
the question of biopsies,
227
624945
3195
活組織切片檢查,
10:40
cancerous biopsies,
228
628140
2767
癌組織切片,
10:42
and they've asked the computer to identify
229
630907
2315
他們叫電腦自己判別,
10:45
by looking at the data and survival rates
230
633222
2471
電腦分析數據和存活率,
10:47
to determine whether cells are actually
231
635693
4667
判斷是否為癌症細胞。
10:52
cancerous or not,
232
640360
2544
10:54
and sure enough, when you throw the data at it,
233
642904
1778
果然,當你把數據丟給電腦,
10:56
through a machine-learning algorithm,
234
644682
2047
透過一個機器學習的演算法,
10:58
the machine was able to identify
235
646729
1877
電腦真的能找出
11:00
the 12 telltale signs that best predict
236
648606
2262
12 大危險徵兆,
預測這個乳房癌細胞的切片
11:02
that this biopsy of the breast cancer cells
237
650868
3299
11:06
are indeed cancerous.
238
654167
3218
真的就是癌腫瘤。
11:09
The problem: The medical literature
239
657385
2498
問題來了:醫學文獻只知道
11:11
only knew nine of them.
240
659883
2789
其中 9 項。
11:14
Three of the traits were ones
241
662672
1800
另外 3 項特性
11:16
that people didn't need to look for,
242
664472
2975
是我們以前不需檢查的,
卻被電腦找出來了。
11:19
but that the machine spotted.
243
667447
5531
好。
11:24
Now, there are dark sides to big data as well.
244
672978
5925
不過,大數據也有不好的一面。
11:30
It will improve our lives, but there are problems
245
678903
2074
它會改善我們的生活,
但是也有我們必須注意的問題。
11:32
that we need to be conscious of,
246
680977
2640
11:35
and the first one is the idea
247
683617
2623
第一,
我們可能因為預測而受罰,
11:38
that we may be punished for predictions,
248
686240
2686
11:40
that the police may use big data for their purposes,
249
688926
3870
警察可能會利用大數據來辦案,
11:44
a little bit like "Minority Report."
250
692796
2351
有點像電影《關鍵報告》。
11:47
Now, it's a term called predictive policing,
251
695147
2441
這叫做「預測性警務」,
11:49
or algorithmic criminology,
252
697588
2363
或「演算犯罪學」。
11:51
and the idea is that if we take a lot of data,
253
699951
2036
原理是,我們蒐集大量數據,
11:53
for example where past crimes have been,
254
701987
2159
例如,分析過去犯罪發生地點的大數據,
11:56
we know where to send the patrols.
255
704146
2543
我們就知道要往哪裡派送警力。
11:58
That makes sense, but the problem, of course,
256
706689
2115
這很合邏輯。但問題是,當然,
12:00
is that it's not simply going to stop on location data,
257
708804
4544
這種策略不會
只限犯罪地點的數據,
只限犯罪地點的數據,
12:05
it's going to go down to the level of the individual.
258
713348
2959
而會一直延伸到個人資料。
12:08
Why don't we use data about the person's
259
716307
2250
何不利用人們的
12:10
high school transcript?
260
718557
2228
高中成績單?
12:12
Maybe we should use the fact that
261
720785
1561
或許我們可以看看
12:14
they're unemployed or not, their credit score,
262
722346
2028
他們是否失業、信用評等、
12:16
their web-surfing behavior,
263
724374
1552
上網瀏覽行為、
12:17
whether they're up late at night.
264
725926
1878
是否熬夜、
12:19
Their Fitbit, when it's able
to identify biochemistries,
to identify biochemistries,
265
727804
3161
Fitbit 智慧健康手環,
當它能識別個人生化數據,
當它能識別個人生化數據,
12:22
will show that they have aggressive thoughts.
266
730965
4236
可看出主人是否有攻擊性的想法。
12:27
We may have algorithms that are likely to predict
267
735201
2221
可能有演算法
會預測我們將要做什麼事,
會預測我們將要做什麼事,
12:29
what we are about to do,
268
737422
1633
12:31
and we may be held accountable
269
739055
1244
可能還沒有付諸行動,就得負責。
12:32
before we've actually acted.
270
740299
2590
12:34
Privacy was the central challenge
271
742889
1732
在小數據時代,
最重要的挑戰是隱私。
最重要的挑戰是隱私。
12:36
in a small data era.
272
744621
2880
12:39
In the big data age,
273
747501
2149
在大數據時代,
12:41
the challenge will be safeguarding free will,
274
749650
4523
挑戰則變成保衛自由意志、
12:46
moral choice, human volition,
275
754173
3779
道德選擇、人的意志、
12:49
human agency.
276
757952
3068
人的「能動性」(human agency)。
12:54
There is another problem:
277
762540
2225
還有一個問題:
12:56
Big data is going to steal our jobs.
278
764765
3556
大數據會搶走我們的工作。
13:00
Big data and algorithms are going to challenge
279
768321
3512
大數據和演算法將會挑戰
13:03
white collar, professional knowledge work
280
771833
3061
21 世紀的白領、專業知識工作,
13:06
in the 21st century
281
774894
1653
13:08
in the same way that factory automation
282
776547
2434
就像工廠自動化和生產線
13:10
and the assembly line
283
778981
2189
13:13
challenged blue collar labor in the 20th century.
284
781170
3026
在 20 世紀挑戰藍領工作者一樣。
13:16
Think about a lab technician
285
784196
2092
試想一位實驗室技術員,
13:18
who is looking through a microscope
286
786288
1409
他正在用顯微鏡看腫瘤切片,
13:19
at a cancer biopsy
287
787697
1624
13:21
and determining whether it's cancerous or not.
288
789321
2637
要判斷是否為癌細胞。
13:23
The person went to university.
289
791958
1972
他唸過大學,
13:25
The person buys property.
290
793930
1430
買了房子,
13:27
He or she votes.
291
795360
1741
會投票,
13:29
He or she is a stakeholder in society.
292
797101
3666
他與社會利害相關。
13:32
And that person's job,
293
800767
1394
他的工作,及許多像他一樣的專業人士,
13:34
as well as an entire fleet
294
802161
1609
13:35
of professionals like that person,
295
803770
1969
將發現他們的工作起了劇變,
13:37
is going to find that their jobs are radically changed
296
805739
3150
13:40
or actually completely eliminated.
297
808889
2357
甚至完全被淘汰。
13:43
Now, we like to think
298
811246
1284
我們喜歡相信
13:44
that technology creates jobs over a period of time
299
812530
3187
長遠來說,科技創造工作機會,
13:47
after a short, temporary period of dislocation,
300
815717
3465
即使剛開始會先經歷
短暫的錯亂與重組,
短暫的錯亂與重組,
13:51
and that is true for the frame of reference
301
819182
1941
這對我們所處的工業革命時代來說,
並沒有錯,
並沒有錯,
13:53
with which we all live, the Industrial Revolution,
302
821123
2142
13:55
because that's precisely what happened.
303
823265
2328
因為事實的確如此。
13:57
But we forget something in that analysis:
304
825593
2333
但是這個分析遺漏了一點:
13:59
There are some categories of jobs
305
827926
1830
有些工作類別其實已經消失,
14:01
that simply get eliminated and never come back.
306
829756
3420
且從未起死回生。
14:05
The Industrial Revolution wasn't very good
307
833176
2004
如果你是一匹馬,
那麼工業革命對你並不利。
那麼工業革命對你並不利。
14:07
if you were a horse.
308
835180
4002
14:11
So we're going to need to be careful
309
839182
2055
所以我們必須非常謹慎,
14:13
and take big data and adjust it for our needs,
310
841237
3514
正確駕馭大數據,
調整它以適應我們所需,
調整它以適應我們所需,
14:16
our very human needs.
311
844751
3185
滿足我們的人性需求。
14:19
We have to be the master of this technology,
312
847936
1954
我們必須成為這項科技的主人,
14:21
not its servant.
313
849890
1656
而不是淪為它的奴隸。
14:23
We are just at the outset of the big data era,
314
851546
2958
大數據時代才正開始,
14:26
and honestly, we are not very good
315
854504
3150
老實說,我們並不是很擅長
14:29
at handling all the data that we can now collect.
316
857654
4207
處理我們能蒐集的龐大數據資料。
14:33
It's not just a problem for
the National Security Agency.
the National Security Agency.
317
861861
3330
這不只是國安局的問題,
14:37
Businesses collect lots of
data, and they misuse it too,
data, and they misuse it too,
318
865191
3038
企業也蒐集大量資料,
同樣也誤用、濫用。
同樣也誤用、濫用。
14:40
and we need to get better at
this, and this will take time.
this, and this will take time.
319
868229
3667
我們都必須學習怎麼正確運用,
而這需要時間。
而這需要時間。
14:43
It's a little bit like the challenge that was faced
320
871896
1822
有點像原始人用火
所面臨的挑戰。
所面臨的挑戰。
14:45
by primitive man and fire.
321
873718
2407
14:48
This is a tool, but this is a tool that,
322
876125
1885
大數據是個工具,
14:50
unless we're careful, will burn us.
323
878010
3559
如果運用失當,就會燒傷我們。
14:56
Big data is going to transform how we live,
324
884008
3120
大數據將改變我們如何生活、
14:59
how we work and how we think.
325
887128
2801
工作,和思考。
15:01
It is going to help us manage our careers
326
889929
1889
它可以幫助我們管理職涯,
15:03
and lead lives of satisfaction and hope
327
891818
3634
讓我們過滿意、夢想的生活,
15:07
and happiness and health,
328
895452
2992
帶來快樂與健康。
15:10
but in the past, we've often
looked at information technology
looked at information technology
329
898444
3306
以往,我們常在看待「資訊科技」時,
15:13
and our eyes have only seen the T,
330
901750
2208
只專注在「科技」,
15:15
the technology, the hardware,
331
903958
1686
只重視硬體,
15:17
because that's what was physical.
332
905644
2262
因為它具體可見。
15:19
We now need to recast our gaze at the I,
333
907906
2924
現在我們必須重新對焦,
15:22
the information,
334
910830
1380
轉向「資訊」,
15:24
which is less apparent,
335
912210
1373
它比較不明顯,
15:25
but in some ways a lot more important.
336
913583
4109
但是就某些方面來說,卻重要得多。
15:29
Humanity can finally learn from the information
337
917692
3465
人性總算可以向我們蒐集來的資訊學習,
15:33
that it can collect,
338
921157
2418
15:35
as part of our timeless quest
339
923575
2115
成為我們永恆追尋的一部份,
15:37
to understand the world and our place in it,
340
925690
3159
藉此了解我們的世界,和人類的角色,
15:40
and that's why big data is a big deal.
341
928849
5631
這是為什麼大數據將「大」有可為。
15:46
(Applause)
342
934480
3568
(觀眾掌聲)
ABOUT THE SPEAKER
Kenneth Cukier - Data Editor of The EconomistKenneth Cukier is the Data Editor of The Economist. From 2007 to 2012 he was the Tokyo correspondent, and before that, the paper’s technology correspondent in London, where his work focused on innovation, intellectual property and Internet governance. Kenneth is also the co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think with Viktor Mayer-Schönberger in 2013, which was a New York Times Bestseller and translated into 16 languages.
Why you should listen
As Data Editor of The Economist and co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think, Kenneth Cukier has spent years immersed in big data, machine learning -- and the impact of both. What's the future of big data-driven technology and design? To find out, watch this talk.
Kenneth Cukier | Speaker | TED.com