15:56
TEDSalon Berlin 2014

Kenneth Cukier: Big data is better data

Filmed:

Self-driving cars were just the start. What's the future of big data-driven technology and design? In a thrilling science talk, Kenneth Cukier looks at what's next for machine learning -- and human knowledge.

- Data Editor of The Economist
Kenneth Cukier is the Data Editor of The Economist. From 2007 to 2012 he was the Tokyo correspondent, and before that, the paper’s technology correspondent in London, where his work focused on innovation, intellectual property and Internet governance. Kenneth is also the co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think with Viktor Mayer-Schönberger in 2013, which was a New York Times Bestseller and translated into 16 languages. Full bio

America's favorite pie is?
00:12
Audience: Apple.
Kenneth Cukier: Apple. Of course it is.
00:16
How do we know it?
00:20
Because of data.
00:21
You look at supermarket sales.
00:24
You look at supermarket
sales of 30-centimeter pies
00:26
that are frozen, and apple wins, no contest.
00:29
The majority of the sales are apple.
00:33
But then supermarkets started selling
00:38
smaller, 11-centimeter pies,
00:41
and suddenly, apple fell to fourth or fifth place.
00:43
Why? What happened?
00:48
Okay, think about it.
00:50
When you buy a 30-centimeter pie,
00:53
the whole family has to agree,
00:57
and apple is everyone's second favorite.
00:59
(Laughter)
01:03
But when you buy an individual 11-centimeter pie,
01:05
you can buy the one that you want.
01:09
You can get your first choice.
01:12
You have more data.
01:16
You can see something
01:18
that you couldn't see
01:20
when you only had smaller amounts of it.
01:21
Now, the point here is that more data
01:25
doesn't just let us see more,
01:27
more of the same thing we were looking at.
01:29
More data allows us to see new.
01:31
It allows us to see better.
01:35
It allows us to see different.
01:38
In this case, it allows us to see
01:42
what America's favorite pie is:
01:45
not apple.
01:48
Now, you probably all have heard the term big data.
01:50
In fact, you're probably sick of hearing the term
01:54
big data.
01:56
It is true that there is a lot of hype around the term,
01:58
and that is very unfortunate,
02:01
because big data is an extremely important tool
02:03
by which society is going to advance.
02:06
In the past, we used to look at small data
02:10
and think about what it would mean
02:14
to try to understand the world,
02:15
and now we have a lot more of it,
02:17
more than we ever could before.
02:19
What we find is that when we have
02:22
a large body of data, we can fundamentally do things
02:23
that we couldn't do when we
only had smaller amounts.
02:26
Big data is important, and big data is new,
02:29
and when you think about it,
02:32
the only way this planet is going to deal
02:34
with its global challenges —
02:36
to feed people, supply them with medical care,
02:38
supply them with energy, electricity,
02:41
and to make sure they're not burnt to a crisp
02:44
because of global warming —
02:46
is because of the effective use of data.
02:47
So what is new about big
data? What is the big deal?
02:51
Well, to answer that question, let's think about
02:55
what information looked like,
02:58
physically looked like in the past.
03:00
In 1908, on the island of Crete,
03:03
archaeologists discovered a clay disc.
03:06
They dated it from 2000 B.C., so it's 4,000 years old.
03:11
Now, there's inscriptions on this disc,
03:15
but we actually don't know what it means.
03:17
It's a complete mystery, but the point is that
03:18
this is what information used to look like
03:21
4,000 years ago.
03:22
This is how society stored
03:25
and transmitted information.
03:27
Now, society hasn't advanced all that much.
03:31
We still store information on discs,
03:35
but now we can store a lot more information,
03:38
more than ever before.
03:41
Searching it is easier. Copying it easier.
03:43
Sharing it is easier. Processing it is easier.
03:46
And what we can do is we can reuse this information
03:49
for uses that we never even imagined
03:52
when we first collected the data.
03:54
In this respect, the data has gone
03:57
from a stock to a flow,
03:59
from something that is stationary and static
04:03
to something that is fluid and dynamic.
04:07
There is, if you will, a liquidity to information.
04:10
The disc that was discovered off of Crete
04:14
that's 4,000 years old, is heavy,
04:18
it doesn't store a lot of information,
04:22
and that information is unchangeable.
04:24
By contrast, all of the files
04:27
that Edward Snowden took
04:31
from the National Security
Agency in the United States
04:33
fits on a memory stick
04:35
the size of a fingernail,
04:38
and it can be shared at the speed of light.
04:41
More data. More.
04:45
Now, one reason why we have
so much data in the world today
04:51
is we are collecting things
04:53
that we've always collected information on,
04:54
but another reason why is we're taking things
04:57
that have always been informational
05:00
but have never been rendered into a data format
05:03
and we are putting it into data.
05:05
Think, for example, the question of location.
05:08
Take, for example, Martin Luther.
05:11
If we wanted to know in the 1500s
05:13
where Martin Luther was,
05:15
we would have to follow him at all times,
05:18
maybe with a feathery quill and an inkwell,
05:20
and record it,
05:22
but now think about what it looks like today.
05:23
You know that somewhere,
05:26
probably in a telecommunications carrier's database,
05:28
there is a spreadsheet or at least a database entry
05:30
that records your information
05:33
of where you've been at all times.
05:35
If you have a cell phone,
05:37
and that cell phone has GPS,
but even if it doesn't have GPS,
05:39
it can record your information.
05:42
In this respect, location has been datafied.
05:44
Now think, for example, of the issue of posture,
05:48
the way that you are all sitting right now,
05:53
the way that you sit,
05:54
the way that you sit, the way that you sit.
05:56
It's all different, and it's a function of your leg length
05:59
and your back and the contours of your back,
06:01
and if I were to put sensors,
maybe 100 sensors
06:03
into all of your chairs right now,
06:05
I could create an index that's fairly unique to you,
06:07
sort of like a fingerprint, but it's not your finger.
06:11
So what could we do with this?
06:15
Researchers in Tokyo are using it
06:18
as a potential anti-theft device in cars.
06:21
The idea is that the carjacker sits behind the wheel,
06:25
tries to stream off, but the car recognizes
06:28
that a non-approved driver is behind the wheel,
06:30
and maybe the engine just stops, unless you
06:32
type in a password into the dashboard
06:35
to say, "Hey, I have authorization to drive." Great.
06:38
What if every single car in Europe
06:42
had this technology in it?
06:45
What could we do then?
06:46
Maybe, if we aggregated the data,
06:50
maybe we could identify telltale signs
06:52
that best predict that a car accident
06:56
is going to take place in the next five seconds.
06:58
And then what we will have datafied
07:04
is driver fatigue,
07:07
and the service would be when the car senses
07:09
that the person slumps into that position,
07:11
automatically knows, hey, set an internal alarm
07:14
that would vibrate the steering wheel, honk inside
07:18
to say, "Hey, wake up,
07:20
pay more attention to the road."
07:22
These are the sorts of things we can do
07:24
when we datafy more aspects of our lives.
07:26
So what is the value of big data?
07:29
Well, think about it.
07:32
You have more information.
07:35
You can do things that you couldn't do before.
07:37
One of the most impressive areas
07:40
where this concept is taking place
07:42
is in the area of machine learning.
07:44
Machine learning is a branch of artificial intelligence,
07:47
which itself is a branch of computer science.
07:50
The general idea is that instead of
07:53
instructing a computer what do do,
07:55
we are going to simply throw data at the problem
07:57
and tell the computer to figure it out for itself.
08:00
And it will help you understand it
08:03
by seeing its origins.
08:05
In the 1950s, a computer scientist
08:08
at IBM named Arthur Samuel liked to play checkers,
08:11
so he wrote a computer program
08:14
so he could play against the computer.
08:16
He played. He won.
08:18
He played. He won.
08:21
He played. He won,
08:23
because the computer only knew
08:26
what a legal move was.
08:28
Arthur Samuel knew something else.
08:30
Arthur Samuel knew strategy.
08:32
So he wrote a small sub-program alongside it
08:37
operating in the background, and all it did
08:39
was score the probability
08:41
that a given board configuration would likely lead
08:43
to a winning board versus a losing board
08:46
after every move.
08:49
He plays the computer. He wins.
08:51
He plays the computer. He wins.
08:54
He plays the computer. He wins.
08:57
And then Arthur Samuel leaves the computer
09:01
to play itself.
09:03
It plays itself. It collects more data.
09:05
It collects more data. It increases
the accuracy of its prediction.
09:09
And then Arthur Samuel goes back to the computer
09:13
and he plays it, and he loses,
09:15
and he plays it, and he loses,
09:17
and he plays it, and he loses,
09:19
and Arthur Samuel has created a machine
09:21
that surpasses his ability in a task that he taught it.
09:24
And this idea of machine learning
09:30
is going everywhere.
09:33
How do you think we have self-driving cars?
09:37
Are we any better off as a society
09:40
enshrining all the rules of the road into software?
09:42
No. Memory is cheaper. No.
09:45
Algorithms are faster. No. Processors are better. No.
09:48
All of those things matter, but that's not why.
09:52
It's because we changed the nature of the problem.
09:55
We changed the nature of the problem from one
09:58
in which we tried to overtly and explicitly
09:59
explain to the computer how to drive
10:02
to one in which we say,
10:04
"Here's a lot of data around the vehicle.
10:05
You figure it out.
10:07
You figure it out that that is a traffic light,
10:09
that that traffic light is red and not green,
10:11
that that means that you need to stop
10:13
and not go forward."
10:15
Machine learning is at the basis
10:18
of many of the things that we do online:
10:19
search engines,
10:21
Amazon's personalization algorithm,
10:23
computer translation,
10:27
voice recognition systems.
10:29
Researchers recently have looked at
10:34
the question of biopsies,
10:36
cancerous biopsies,
10:40
and they've asked the computer to identify
10:42
by looking at the data and survival rates
10:45
to determine whether cells are actually
10:47
cancerous or not,
10:52
and sure enough, when you throw the data at it,
10:54
through a machine-learning algorithm,
10:56
the machine was able to identify
10:58
the 12 telltale signs that best predict
11:00
that this biopsy of the breast cancer cells
11:02
are indeed cancerous.
11:06
The problem: The medical literature
11:09
only knew nine of them.
11:11
Three of the traits were ones
11:14
that people didn't need to look for,
11:16
but that the machine spotted.
11:19
Now, there are dark sides to big data as well.
11:24
It will improve our lives, but there are problems
11:30
that we need to be conscious of,
11:32
and the first one is the idea
11:35
that we may be punished for predictions,
11:38
that the police may use big data for their purposes,
11:40
a little bit like "Minority Report."
11:44
Now, it's a term called predictive policing,
11:47
or algorithmic criminology,
11:49
and the idea is that if we take a lot of data,
11:51
for example where past crimes have been,
11:53
we know where to send the patrols.
11:56
That makes sense, but the problem, of course,
11:58
is that it's not simply going to stop on location data,
12:00
it's going to go down to the level of the individual.
12:05
Why don't we use data about the person's
12:08
high school transcript?
12:10
Maybe we should use the fact that
12:12
they're unemployed or not, their credit score,
12:14
their web-surfing behavior,
12:16
whether they're up late at night.
12:17
Their Fitbit, when it's able
to identify biochemistries,
12:19
will show that they have aggressive thoughts.
12:22
We may have algorithms that are likely to predict
12:27
what we are about to do,
12:29
and we may be held accountable
12:31
before we've actually acted.
12:32
Privacy was the central challenge
12:34
in a small data era.
12:36
In the big data age,
12:39
the challenge will be safeguarding free will,
12:41
moral choice, human volition,
12:46
human agency.
12:49
There is another problem:
12:54
Big data is going to steal our jobs.
12:56
Big data and algorithms are going to challenge
13:00
white collar, professional knowledge work
13:03
in the 21st century
13:06
in the same way that factory automation
13:08
and the assembly line
13:10
challenged blue collar labor in the 20th century.
13:13
Think about a lab technician
13:16
who is looking through a microscope
13:18
at a cancer biopsy
13:19
and determining whether it's cancerous or not.
13:21
The person went to university.
13:23
The person buys property.
13:25
He or she votes.
13:27
He or she is a stakeholder in society.
13:29
And that person's job,
13:32
as well as an entire fleet
13:34
of professionals like that person,
13:35
is going to find that their jobs are radically changed
13:37
or actually completely eliminated.
13:40
Now, we like to think
13:43
that technology creates jobs over a period of time
13:44
after a short, temporary period of dislocation,
13:47
and that is true for the frame of reference
13:51
with which we all live, the Industrial Revolution,
13:53
because that's precisely what happened.
13:55
But we forget something in that analysis:
13:57
There are some categories of jobs
13:59
that simply get eliminated and never come back.
14:01
The Industrial Revolution wasn't very good
14:05
if you were a horse.
14:07
So we're going to need to be careful
14:11
and take big data and adjust it for our needs,
14:13
our very human needs.
14:16
We have to be the master of this technology,
14:19
not its servant.
14:21
We are just at the outset of the big data era,
14:23
and honestly, we are not very good
14:26
at handling all the data that we can now collect.
14:29
It's not just a problem for
the National Security Agency.
14:33
Businesses collect lots of
data, and they misuse it too,
14:37
and we need to get better at
this, and this will take time.
14:40
It's a little bit like the challenge that was faced
14:43
by primitive man and fire.
14:45
This is a tool, but this is a tool that,
14:48
unless we're careful, will burn us.
14:50
Big data is going to transform how we live,
14:56
how we work and how we think.
14:59
It is going to help us manage our careers
15:01
and lead lives of satisfaction and hope
15:03
and happiness and health,
15:07
but in the past, we've often
looked at information technology
15:10
and our eyes have only seen the T,
15:13
the technology, the hardware,
15:15
because that's what was physical.
15:17
We now need to recast our gaze at the I,
15:19
the information,
15:22
which is less apparent,
15:24
but in some ways a lot more important.
15:25
Humanity can finally learn from the information
15:29
that it can collect,
15:33
as part of our timeless quest
15:35
to understand the world and our place in it,
15:37
and that's why big data is a big deal.
15:40
(Applause)
15:46

▲Back to top

About the Speaker:

Kenneth Cukier - Data Editor of The Economist
Kenneth Cukier is the Data Editor of The Economist. From 2007 to 2012 he was the Tokyo correspondent, and before that, the paper’s technology correspondent in London, where his work focused on innovation, intellectual property and Internet governance. Kenneth is also the co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think with Viktor Mayer-Schönberger in 2013, which was a New York Times Bestseller and translated into 16 languages.

Why you should listen

As Data Editor of The Economist and co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think, Kenneth Cukier has spent years immersed in big data, machine learning -- and the impact of both. What's the future of big data-driven technology and design? To find out, watch this talk.

More profile about the speaker
Kenneth Cukier | Speaker | TED.com