TEDxCERN

Sinan Aral: How we can protect truth in the age of misinformation

Filmed:
1,289,169 views

Fake news can sway elections, tank economies and sow discord in everyday life. Data scientist Sinan Aral demystifies how and why it spreads so quickly -- citing one of the largest studies on misinformation -- and identifies five strategies to help us unweave the tangled web between true and false.

Double-click the English transcript below to play the video.

00:13
So, on April 23 of 2013,
0
1468
5222
00:18
the Associated Press
put out the following tweet on Twitter.
1
6714
5514
00:24
It said, "Breaking news:
2
12252
2397
00:26
Two explosions at the White House
3
14673
2571
00:29
and Barack Obama has been injured."
4
17268
2333
00:32
This tweet was retweeted 4,000 times
in less than five minutes,
5
20212
5425
00:37
and it went viral thereafter.
6
25661
2217
00:40
Now, this tweet wasn't real news
put out by the Associated Press.
7
28760
4350
00:45
In fact it was false news, or fake news,
8
33134
3333
00:48
that was propagated by Syrian hackers
9
36491
2825
00:51
that had infiltrated
the Associated Press Twitter handle.
10
39340
4694
00:56
Their purpose was to disrupt society,
but they disrupted much more.
11
44407
3889
01:00
Because automated trading algorithms
12
48320
2476
01:02
immediately seized
on the sentiment on this tweet,
13
50820
3360
01:06
and began trading based on the potential
14
54204
2968
01:09
that the president of the United States
had been injured or killed
15
57196
3381
01:12
in this explosion.
16
60601
1200
01:14
And as they started tweeting,
17
62188
1992
01:16
they immediately sent
the stock market crashing,
18
64204
3349
01:19
wiping out 140 billion dollars
in equity value in a single day.
19
67577
5167
01:25
Robert Mueller, special counsel
prosecutor in the United States,
20
73062
4476
01:29
issued indictments
against three Russian companies
21
77562
3892
01:33
and 13 Russian individuals
22
81478
2619
01:36
on a conspiracy to defraud
the United States
23
84121
3167
01:39
by meddling in the 2016
presidential election.
24
87312
3780
01:43
And what this indictment tells as a story
25
91855
3564
01:47
is the story of the Internet
Research Agency,
26
95443
3142
01:50
the shadowy arm of the Kremlin
on social media.
27
98609
3594
01:54
During the presidential election alone,
28
102815
2777
01:57
the Internet Agency's efforts
29
105616
1889
01:59
reached 126 million people
on Facebook in the United States,
30
107529
5167
02:04
issued three million individual tweets
31
112720
3277
02:08
and 43 hours' worth of YouTube content.
32
116021
3842
02:11
All of which was fake --
33
119887
1652
02:13
misinformation designed to sow discord
in the US presidential election.
34
121563
6323
02:20
A recent study by Oxford University
35
128996
2650
02:23
showed that in the recent
Swedish elections,
36
131670
3270
02:26
one third of all of the information
spreading on social media
37
134964
4375
02:31
about the election
38
139363
1198
02:32
was fake or misinformation.
39
140585
2087
02:35
In addition, these types
of social-media misinformation campaigns
40
143037
5078
02:40
can spread what has been called
"genocidal propaganda,"
41
148139
4151
02:44
for instance against
the Rohingya in Burma,
42
152314
3111
02:47
triggering mob killings in India.
43
155449
2303
02:49
We studied fake news
44
157776
1494
02:51
and began studying it
before it was a popular term.
45
159294
3219
02:55
And we recently published
the largest-ever longitudinal study
46
163030
5040
03:00
of the spread of fake news online
47
168094
2286
03:02
on the cover of "Science"
in March of this year.
48
170404
3204
03:06
We studied all of the verified
true and false news stories
49
174523
4161
03:10
that ever spread on Twitter,
50
178708
1753
03:12
from its inception in 2006 to 2017.
51
180485
3818
03:16
And when we studied this information,
52
184612
2314
03:18
we studied verified news stories
53
186950
2876
03:21
that were verified by six
independent fact-checking organizations.
54
189850
3918
03:25
So we knew which stories were true
55
193792
2762
03:28
and which stories were false.
56
196578
2126
03:30
We can measure their diffusion,
57
198728
1873
03:32
the speed of their diffusion,
58
200625
1651
03:34
the depth and breadth of their diffusion,
59
202300
2095
03:36
how many people become entangled
in this information cascade and so on.
60
204419
4142
03:40
And what we did in this paper
61
208942
1484
03:42
was we compared the spread of true news
to the spread of false news.
62
210450
3865
03:46
And here's what we found.
63
214339
1683
03:48
We found that false news
diffused further, faster, deeper
64
216046
3979
03:52
and more broadly than the truth
65
220049
1806
03:53
in every category of information
that we studied,
66
221879
3003
03:56
sometimes by an order of magnitude.
67
224906
2499
03:59
And in fact, false political news
was the most viral.
68
227842
3524
04:03
It diffused further, faster,
deeper and more broadly
69
231390
3147
04:06
than any other type of false news.
70
234561
2802
04:09
When we saw this,
71
237387
1293
04:10
we were at once worried but also curious.
72
238704
2841
04:13
Why?
73
241569
1151
04:14
Why does false news travel
so much further, faster, deeper
74
242744
3373
04:18
and more broadly than the truth?
75
246141
1864
04:20
The first hypothesis
that we came up with was,
76
248339
2961
04:23
"Well, maybe people who spread false news
have more followers or follow more people,
77
251324
4792
04:28
or tweet more often,
78
256140
1557
04:29
or maybe they're more often 'verified'
users of Twitter, with more credibility,
79
257721
4126
04:33
or maybe they've been on Twitter longer."
80
261871
2182
04:36
So we checked each one of these in turn.
81
264077
2298
04:38
And what we found
was exactly the opposite.
82
266691
2920
04:41
False-news spreaders had fewer followers,
83
269635
2436
04:44
followed fewer people, were less active,
84
272095
2254
04:46
less often "verified"
85
274373
1460
04:47
and had been on Twitter
for a shorter period of time.
86
275857
2960
04:50
And yet,
87
278841
1189
04:52
false news was 70 percent more likely
to be retweeted than the truth,
88
280054
5033
04:57
controlling for all of these
and many other factors.
89
285111
3363
05:00
So we had to come up
with other explanations.
90
288498
2690
05:03
And we devised what we called
a "novelty hypothesis."
91
291212
3467
05:07
So if you read the literature,
92
295038
1960
05:09
it is well known that human attention
is drawn to novelty,
93
297022
3754
05:12
things that are new in the environment.
94
300800
2519
05:15
And if you read the sociology literature,
95
303343
1985
05:17
you know that we like to share
novel information.
96
305352
4300
05:21
It makes us seem like we have access
to inside information,
97
309676
3838
05:25
and we gain in status
by spreading this kind of information.
98
313538
3785
05:29
So what we did was we measured the novelty
of an incoming true or false tweet,
99
317792
6452
05:36
compared to the corpus
of what that individual had seen
100
324268
4055
05:40
in the 60 days prior on Twitter.
101
328347
2952
05:43
But that wasn't enough,
because we thought to ourselves,
102
331323
2659
05:46
"Well, maybe false news is more novel
in an information-theoretic sense,
103
334006
4208
05:50
but maybe people
don't perceive it as more novel."
104
338238
3258
05:53
So to understand people's
perceptions of false news,
105
341849
3927
05:57
we looked at the information
and the sentiment
106
345800
3690
06:01
contained in the replies
to true and false tweets.
107
349514
4206
06:06
And what we found
108
354022
1206
06:07
was that across a bunch
of different measures of sentiment --
109
355252
4214
06:11
surprise, disgust, fear, sadness,
110
359490
3301
06:14
anticipation, joy and trust --
111
362815
2484
06:17
false news exhibited significantly more
surprise and disgust
112
365323
5857
06:23
in the replies to false tweets.
113
371204
2806
06:26
And true news exhibited
significantly more anticipation,
114
374392
3789
06:30
joy and trust
115
378205
1547
06:31
in reply to true tweets.
116
379776
2547
06:34
The surprise corroborates
our novelty hypothesis.
117
382347
3786
06:38
This is new and surprising,
and so we're more likely to share it.
118
386157
4609
06:43
At the same time,
there was congressional testimony
119
391092
2925
06:46
in front of both houses of Congress
in the United States,
120
394041
3036
06:49
looking at the role of bots
in the spread of misinformation.
121
397101
3738
06:52
So we looked at this too --
122
400863
1354
06:54
we used multiple sophisticated
bot-detection algorithms
123
402241
3598
06:57
to find the bots in our data
and to pull them out.
124
405863
3074
07:01
So we pulled them out,
we put them back in
125
409347
2659
07:04
and we compared what happens
to our measurement.
126
412030
3119
07:07
And what we found was that, yes indeed,
127
415173
2293
07:09
bots were accelerating
the spread of false news online,
128
417490
3682
07:13
but they were accelerating
the spread of true news
129
421196
2651
07:15
at approximately the same rate.
130
423871
2405
07:18
Which means bots are not responsible
131
426300
2858
07:21
for the differential diffusion
of truth and falsity online.
132
429182
4713
07:25
We can't abdicate that responsibility,
133
433919
2849
07:28
because we, humans,
are responsible for that spread.
134
436792
4259
07:34
Now, everything
that I have told you so far,
135
442472
3334
07:37
unfortunately for all of us,
136
445830
1754
07:39
is the good news.
137
447608
1261
07:42
The reason is because
it's about to get a whole lot worse.
138
450670
4450
07:47
And two specific technologies
are going to make it worse.
139
455850
3682
07:52
We are going to see the rise
of a tremendous wave of synthetic media.
140
460207
5172
07:57
Fake video, fake audio
that is very convincing to the human eye.
141
465403
6031
08:03
And this will powered by two technologies.
142
471458
2754
08:06
The first of these is known
as "generative adversarial networks."
143
474236
3833
08:10
This is a machine-learning model
with two networks:
144
478093
2563
08:12
a discriminator,
145
480680
1547
08:14
whose job it is to determine
whether something is true or false,
146
482251
4200
08:18
and a generator,
147
486475
1167
08:19
whose job it is to generate
synthetic media.
148
487666
3150
08:22
So the synthetic generator
generates synthetic video or audio,
149
490840
5102
08:27
and the discriminator tries to tell,
"Is this real or is this fake?"
150
495966
4675
08:32
And in fact, it is the job
of the generator
151
500665
2874
08:35
to maximize the likelihood
that it will fool the discriminator
152
503563
4435
08:40
into thinking the synthetic
video and audio that it is creating
153
508022
3587
08:43
is actually true.
154
511633
1730
08:45
Imagine a machine in a hyperloop,
155
513387
2373
08:47
trying to get better
and better at fooling us.
156
515784
2803
08:51
This, combined with the second technology,
157
519114
2500
08:53
which is essentially the democratization
of artificial intelligence to the people,
158
521638
5722
08:59
the ability for anyone,
159
527384
2189
09:01
without any background
in artificial intelligence
160
529597
2830
09:04
or machine learning,
161
532451
1182
09:05
to deploy these kinds of algorithms
to generate synthetic media
162
533657
4103
09:09
makes it ultimately so much easier
to create videos.
163
537784
4547
09:14
The White House issued
a false, doctored video
164
542355
4421
09:18
of a journalist interacting with an intern
who was trying to take his microphone.
165
546800
4288
09:23
They removed frames from this video
166
551427
1999
09:25
in order to make his actions
seem more punchy.
167
553450
3287
09:29
And when videographers
and stuntmen and women
168
557157
3385
09:32
were interviewed
about this type of technique,
169
560566
2427
09:35
they said, "Yes, we use this
in the movies all the time
170
563017
3828
09:38
to make our punches and kicks
look more choppy and more aggressive."
171
566869
4763
09:44
They then put out this video
172
572268
1867
09:46
and partly used it as justification
173
574159
2500
09:48
to revoke Jim Acosta,
the reporter's, press pass
174
576683
3999
09:52
from the White House.
175
580706
1339
09:54
And CNN had to sue
to have that press pass reinstated.
176
582069
4809
10:00
There are about five different paths
that I can think of that we can follow
177
588538
5603
10:06
to try and address some
of these very difficult problems today.
178
594165
3739
10:10
Each one of them has promise,
179
598379
1810
10:12
but each one of them
has its own challenges.
180
600213
2999
10:15
The first one is labeling.
181
603236
2008
10:17
Think about it this way:
182
605268
1357
10:18
when you go to the grocery store
to buy food to consume,
183
606649
3611
10:22
it's extensively labeled.
184
610284
1904
10:24
You know how many calories it has,
185
612212
1992
10:26
how much fat it contains --
186
614228
1801
10:28
and yet when we consume information,
we have no labels whatsoever.
187
616053
4278
10:32
What is contained in this information?
188
620355
1928
10:34
Is the source credible?
189
622307
1453
10:35
Where is this information gathered from?
190
623784
2317
10:38
We have none of that information
191
626125
1825
10:39
when we are consuming information.
192
627974
2103
10:42
That is a potential avenue,
but it comes with its challenges.
193
630101
3238
10:45
For instance, who gets to decide,
in society, what's true and what's false?
194
633363
6451
10:52
Is it the governments?
195
640387
1642
10:54
Is it Facebook?
196
642053
1150
10:55
Is it an independent
consortium of fact-checkers?
197
643601
3762
10:59
And who's checking the fact-checkers?
198
647387
2466
11:02
Another potential avenue is incentives.
199
650427
3084
11:05
We know that during
the US presidential election
200
653535
2634
11:08
there was a wave of misinformation
that came from Macedonia
201
656193
3690
11:11
that didn't have any political motive
202
659907
2337
11:14
but instead had an economic motive.
203
662268
2460
11:16
And this economic motive existed,
204
664752
2148
11:18
because false news travels
so much farther, faster
205
666924
3524
11:22
and more deeply than the truth,
206
670472
2010
11:24
and you can earn advertising dollars
as you garner eyeballs and attention
207
672506
4960
11:29
with this type of information.
208
677490
1960
11:31
But if we can depress the spread
of this information,
209
679474
3833
11:35
perhaps it would reduce
the economic incentive
210
683331
2897
11:38
to produce it at all in the first place.
211
686252
2690
11:40
Third, we can think about regulation,
212
688966
2500
11:43
and certainly, we should think
about this option.
213
691490
2325
11:45
In the United States, currently,
214
693839
1611
11:47
we are exploring what might happen
if Facebook and others are regulated.
215
695474
4848
11:52
While we should consider things
like regulating political speech,
216
700346
3801
11:56
labeling the fact
that it's political speech,
217
704171
2508
11:58
making sure foreign actors
can't fund political speech,
218
706703
3819
12:02
it also has its own dangers.
219
710546
2547
12:05
For instance, Malaysia just instituted
a six-year prison sentence
220
713522
4878
12:10
for anyone found spreading misinformation.
221
718424
2734
12:13
And in authoritarian regimes,
222
721696
2079
12:15
these kinds of policies can be used
to suppress minority opinions
223
723799
4666
12:20
and to continue to extend repression.
224
728489
3508
12:24
The fourth possible option
is transparency.
225
732680
3543
12:28
We want to know
how do Facebook's algorithms work.
226
736843
3714
12:32
How does the data
combine with the algorithms
227
740581
2880
12:35
to produce the outcomes that we see?
228
743485
2838
12:38
We want them to open the kimono
229
746347
2349
12:40
and show us exactly the inner workings
of how Facebook is working.
230
748720
4214
12:44
And if we want to know
social media's effect on society,
231
752958
2779
12:47
we need scientists, researchers
232
755761
2086
12:49
and others to have access
to this kind of information.
233
757871
3143
12:53
But at the same time,
234
761038
1547
12:54
we are asking Facebook
to lock everything down,
235
762609
3801
12:58
to keep all of the data secure.
236
766434
2173
13:00
So, Facebook and the other
social media platforms
237
768631
3159
13:03
are facing what I call
a transparency paradox.
238
771814
3134
13:07
We are asking them, at the same time,
239
775266
2674
13:09
to be open and transparent
and, simultaneously secure.
240
777964
4809
13:14
This is a very difficult needle to thread,
241
782797
2691
13:17
but they will need to thread this needle
242
785512
1913
13:19
if we are to achieve the promise
of social technologies
243
787449
3787
13:23
while avoiding their peril.
244
791260
1642
13:24
The final thing that we could think about
is algorithms and machine learning.
245
792926
4691
13:29
Technology devised to root out
and understand fake news, how it spreads,
246
797641
5277
13:34
and to try and dampen its flow.
247
802942
2331
13:37
Humans have to be in the loop
of this technology,
248
805824
2897
13:40
because we can never escape
249
808745
2278
13:43
that underlying any technological
solution or approach
250
811047
4038
13:47
is a fundamental ethical
and philosophical question
251
815109
4047
13:51
about how do we define truth and falsity,
252
819180
3270
13:54
to whom do we give the power
to define truth and falsity
253
822474
3180
13:57
and which opinions are legitimate,
254
825678
2460
14:00
which type of speech
should be allowed and so on.
255
828162
3706
14:03
Technology is not a solution for that.
256
831892
2328
14:06
Ethics and philosophy
is a solution for that.
257
834244
3698
14:10
Nearly every theory
of human decision making,
258
838950
3318
14:14
human cooperation and human coordination
259
842292
2761
14:17
has some sense of the truth at its core.
260
845077
3674
14:21
But with the rise of fake news,
261
849347
2056
14:23
the rise of fake video,
262
851427
1443
14:24
the rise of fake audio,
263
852894
1882
14:26
we are teetering on the brink
of the end of reality,
264
854800
3924
14:30
where we cannot tell
what is real from what is fake.
265
858748
3889
14:34
And that's potentially
incredibly dangerous.
266
862661
3039
14:38
We have to be vigilant
in defending the truth
267
866931
3948
14:42
against misinformation.
268
870903
1534
14:44
With our technologies, with our policies
269
872919
3436
14:48
and, perhaps most importantly,
270
876379
1920
14:50
with our own individual responsibilities,
271
878323
3214
14:53
decisions, behaviors and actions.
272
881561
3555
14:57
Thank you very much.
273
885553
1437
14:59
(Applause)
274
887014
3517
Translated by Ivana Korom
Reviewed by Krystian Aparta

▲Back to top