ABOUT THE SPEAKER
Jennifer Golbeck - Computer scientist
As the director of the Human-Computer Interaction Lab at the University of Maryland, Jennifer Golbeck studies how people use social media -- and thinks about ways to improve their interactions.

Why you should listen

Jennifer Golbeck is an associate professor in the College of Information Studies at the University of Maryland, where she also moonlights in the department of computer science. Her work invariably focuses on how to enhance and improve the way that people interact with their own information online. "I approach this from a computer science perspective and my general research hits social networks, trust, web science, artificial intelligence, and human-computer interaction," she writes.

Author of the 2013 book, Analyzing the Social Web, Golbeck likes nothing more than to immerse herself in the inner workings of the Internet tools so many millions of people use daily, to understand the implications of our choices and actions. Recently, she has also been working to bring human-computer interaction ideas to the world of security and privacy systems.

More profile about the speaker
Jennifer Golbeck | Speaker | TED.com
TEDxMidAtlantic 2013

Jennifer Golbeck: Your social media "likes" expose more than you think

Filmed:
2,366,837 views

Do you like curly fries? Have you Liked them on Facebook? Watch this talk to find out the surprising things Facebook (and others) can guess about you from your random Likes and Shares. Computer scientist Jennifer Golbeck explains how this came about, how some applications of the technology are not so cute -- and why she thinks we should return the control of information to its rightful owners.
- Computer scientist
As the director of the Human-Computer Interaction Lab at the University of Maryland, Jennifer Golbeck studies how people use social media -- and thinks about ways to improve their interactions. Full bio

Double-click the English transcript below to play the video.

00:12
If you remember that first decade of the web,
0
738
1997
00:14
it was really a static place.
1
2735
2255
00:16
You could go online, you could look at pages,
2
4990
2245
00:19
and they were put up either by organizations
3
7235
2513
00:21
who had teams to do it
4
9748
1521
00:23
or by individuals who were really tech-savvy
5
11269
2229
00:25
for the time.
6
13498
1737
00:27
And with the rise of social media
7
15235
1575
00:28
and social networks in the early 2000s,
8
16810
2399
00:31
the web was completely changed
9
19209
2149
00:33
to a place where now the vast majority of content
10
21358
3608
00:36
we interact with is put up by average users,
11
24966
3312
00:40
either in YouTube videos or blog posts
12
28278
2697
00:42
or product reviews or social media postings.
13
30975
3315
00:46
And it's also become a much more interactive place,
14
34290
2347
00:48
where people are interacting with others,
15
36637
2637
00:51
they're commenting, they're sharing,
16
39274
1696
00:52
they're not just reading.
17
40970
1614
00:54
So Facebook is not the only place you can do this,
18
42584
1866
00:56
but it's the biggest,
19
44450
1098
00:57
and it serves to illustrate the numbers.
20
45548
1784
00:59
Facebook has 1.2 billion users per month.
21
47332
3477
01:02
So half the Earth's Internet population
22
50809
1930
01:04
is using Facebook.
23
52739
1653
01:06
They are a site, along with others,
24
54392
1932
01:08
that has allowed people to create an online persona
25
56324
3219
01:11
with very little technical skill,
26
59543
1782
01:13
and people responded by putting huge amounts
27
61325
2476
01:15
of personal data online.
28
63801
1983
01:17
So the result is that we have behavioral,
29
65784
2543
01:20
preference, demographic data
30
68327
1986
01:22
for hundreds of millions of people,
31
70313
2101
01:24
which is unprecedented in history.
32
72414
2026
01:26
And as a computer scientist,
what this means is that
33
74440
2560
01:29
I've been able to build models
34
77000
1664
01:30
that can predict all sorts of hidden attributes
35
78664
2322
01:32
for all of you that you don't even know
36
80986
2284
01:35
you're sharing information about.
37
83270
2202
01:37
As scientists, we use that to help
38
85472
2382
01:39
the way people interact online,
39
87854
2114
01:41
but there's less altruistic applications,
40
89968
2499
01:44
and there's a problem in that users don't really
41
92467
2381
01:46
understand these techniques and how they work,
42
94848
2470
01:49
and even if they did, they don't
have a lot of control over it.
43
97318
3128
01:52
So what I want to talk to you about today
44
100446
1490
01:53
is some of these things that we're able to do,
45
101936
2702
01:56
and then give us some ideas
of how we might go forward
46
104638
2763
01:59
to move some control back into the hands of users.
47
107401
2769
02:02
So this is Target, the company.
48
110170
1586
02:03
I didn't just put that logo
49
111756
1324
02:05
on this poor, pregnant woman's belly.
50
113080
2170
02:07
You may have seen this anecdote that was printed
51
115250
1840
02:09
in Forbes magazine where Target
52
117090
2061
02:11
sent a flyer to this 15-year-old girl
53
119151
2361
02:13
with advertisements and coupons
54
121512
1710
02:15
for baby bottles and diapers and cribs
55
123222
2554
02:17
two weeks before she told her parents
56
125776
1684
02:19
that she was pregnant.
57
127460
1864
02:21
Yeah, the dad was really upset.
58
129324
2704
02:24
He said, "How did Target figure out
59
132028
1716
02:25
that this high school girl was pregnant
60
133744
1824
02:27
before she told her parents?"
61
135568
1960
02:29
It turns out that they have the purchase history
62
137528
2621
02:32
for hundreds of thousands of customers
63
140149
2301
02:34
and they compute what they
call a pregnancy score,
64
142450
2730
02:37
which is not just whether or
not a woman's pregnant,
65
145180
2332
02:39
but what her due date is.
66
147512
1730
02:41
And they compute that
67
149242
1304
02:42
not by looking at the obvious things,
68
150546
1768
02:44
like, she's buying a crib or baby clothes,
69
152314
2512
02:46
but things like, she bought more vitamins
70
154826
2943
02:49
than she normally had,
71
157769
1717
02:51
or she bought a handbag
72
159486
1464
02:52
that's big enough to hold diapers.
73
160950
1711
02:54
And by themselves, those purchases don't seem
74
162661
1910
02:56
like they might reveal a lot,
75
164571
2469
02:59
but it's a pattern of behavior that,
76
167040
1978
03:01
when you take it in the context
of thousands of other people,
77
169018
3117
03:04
starts to actually reveal some insights.
78
172135
2757
03:06
So that's the kind of thing that we do
79
174892
1793
03:08
when we're predicting stuff
about you on social media.
80
176685
2567
03:11
We're looking for little
patterns of behavior that,
81
179252
2796
03:14
when you detect them among millions of people,
82
182048
2682
03:16
lets us find out all kinds of things.
83
184730
2706
03:19
So in my lab and with colleagues,
84
187436
1747
03:21
we've developed mechanisms where we can
85
189183
1777
03:22
quite accurately predict things
86
190960
1560
03:24
like your political preference,
87
192520
1725
03:26
your personality score, gender, sexual orientation,
88
194245
3752
03:29
religion, age, intelligence,
89
197997
2873
03:32
along with things like
90
200870
1394
03:34
how much you trust the people you know
91
202264
1937
03:36
and how strong those relationships are.
92
204201
1804
03:38
We can do all of this really well.
93
206005
1785
03:39
And again, it doesn't come from what you might
94
207790
2197
03:41
think of as obvious information.
95
209987
2102
03:44
So my favorite example is from this study
96
212089
2281
03:46
that was published this year
97
214370
1240
03:47
in the Proceedings of the National Academies.
98
215610
1795
03:49
If you Google this, you'll find it.
99
217405
1285
03:50
It's four pages, easy to read.
100
218690
1872
03:52
And they looked at just people's Facebook likes,
101
220562
3003
03:55
so just the things you like on Facebook,
102
223565
1920
03:57
and used that to predict all these attributes,
103
225485
2138
03:59
along with some other ones.
104
227623
1645
04:01
And in their paper they listed the five likes
105
229268
2961
04:04
that were most indicative of high intelligence.
106
232229
2787
04:07
And among those was liking a page
107
235016
2324
04:09
for curly fries. (Laughter)
108
237340
1905
04:11
Curly fries are delicious,
109
239245
2093
04:13
but liking them does not necessarily mean
110
241338
2530
04:15
that you're smarter than the average person.
111
243868
2080
04:17
So how is it that one of the strongest indicators
112
245948
3207
04:21
of your intelligence
113
249155
1570
04:22
is liking this page
114
250725
1447
04:24
when the content is totally irrelevant
115
252172
2252
04:26
to the attribute that's being predicted?
116
254424
2527
04:28
And it turns out that we have to look at
117
256951
1584
04:30
a whole bunch of underlying theories
118
258535
1618
04:32
to see why we're able to do this.
119
260153
2569
04:34
One of them is a sociological
theory called homophily,
120
262722
2913
04:37
which basically says people are
friends with people like them.
121
265635
3092
04:40
So if you're smart, you tend to
be friends with smart people,
122
268727
2014
04:42
and if you're young, you tend
to be friends with young people,
123
270741
2630
04:45
and this is well established
124
273371
1627
04:46
for hundreds of years.
125
274998
1745
04:48
We also know a lot
126
276743
1232
04:49
about how information spreads through networks.
127
277975
2550
04:52
It turns out things like viral videos
128
280525
1754
04:54
or Facebook likes or other information
129
282279
2406
04:56
spreads in exactly the same way
130
284685
1888
04:58
that diseases spread through social networks.
131
286573
2454
05:01
So this is something we've studied for a long time.
132
289027
1791
05:02
We have good models of it.
133
290818
1576
05:04
And so you can put those things together
134
292394
2157
05:06
and start seeing why things like this happen.
135
294551
3088
05:09
So if I were to give you a hypothesis,
136
297639
1814
05:11
it would be that a smart guy started this page,
137
299453
3227
05:14
or maybe one of the first people who liked it
138
302680
1939
05:16
would have scored high on that test.
139
304619
1736
05:18
And they liked it, and their friends saw it,
140
306355
2288
05:20
and by homophily, we know that
he probably had smart friends,
141
308643
3122
05:23
and so it spread to them,
and some of them liked it,
142
311765
3056
05:26
and they had smart friends,
143
314821
1189
05:28
and so it spread to them,
144
316010
807
05:28
and so it propagated through the network
145
316817
1973
05:30
to a host of smart people,
146
318790
2569
05:33
so that by the end, the action
147
321359
2056
05:35
of liking the curly fries page
148
323415
2544
05:37
is indicative of high intelligence,
149
325959
1615
05:39
not because of the content,
150
327574
1803
05:41
but because the actual action of liking
151
329377
2522
05:43
reflects back the common attributes
152
331899
1900
05:45
of other people who have done it.
153
333799
2468
05:48
So this is pretty complicated stuff, right?
154
336267
2897
05:51
It's a hard thing to sit down and explain
155
339164
2199
05:53
to an average user, and even if you do,
156
341363
2848
05:56
what can the average user do about it?
157
344211
2188
05:58
How do you know that
you've liked something
158
346399
2048
06:00
that indicates a trait for you
159
348447
1492
06:01
that's totally irrelevant to the
content of what you've liked?
160
349939
3545
06:05
There's a lot of power that users don't have
161
353484
2546
06:08
to control how this data is used.
162
356030
2230
06:10
And I see that as a real
problem going forward.
163
358260
3112
06:13
So I think there's a couple paths
164
361372
1977
06:15
that we want to look at
165
363349
1001
06:16
if we want to give users some control
166
364350
1910
06:18
over how this data is used,
167
366260
1740
06:20
because it's not always going to be used
168
368000
1940
06:21
for their benefit.
169
369940
1381
06:23
An example I often give is that,
170
371321
1422
06:24
if I ever get bored being a professor,
171
372743
1646
06:26
I'm going to go start a company
172
374389
1653
06:28
that predicts all of these attributes
173
376042
1454
06:29
and things like how well you work in teams
174
377496
1602
06:31
and if you're a drug user, if you're an alcoholic.
175
379098
2671
06:33
We know how to predict all that.
176
381769
1440
06:35
And I'm going to sell reports
177
383209
1761
06:36
to H.R. companies and big businesses
178
384970
2100
06:39
that want to hire you.
179
387070
2273
06:41
We totally can do that now.
180
389343
1177
06:42
I could start that business tomorrow,
181
390520
1788
06:44
and you would have
absolutely no control
182
392308
2052
06:46
over me using your data like that.
183
394360
2138
06:48
That seems to me to be a problem.
184
396498
2292
06:50
So one of the paths we can go down
185
398790
1910
06:52
is the policy and law path.
186
400700
2032
06:54
And in some respects, I think
that that would be most effective,
187
402732
3046
06:57
but the problem is we'd
actually have to do it.
188
405778
2756
07:00
Observing our political process in action
189
408534
2780
07:03
makes me think it's highly unlikely
190
411314
2379
07:05
that we're going to get a bunch of representatives
191
413693
1597
07:07
to sit down, learn about this,
192
415290
1986
07:09
and then enact sweeping changes
193
417276
2106
07:11
to intellectual property law in the U.S.
194
419382
2157
07:13
so users control their data.
195
421539
2461
07:16
We could go the policy route,
196
424000
1304
07:17
where social media companies say,
197
425304
1479
07:18
you know what? You own your data.
198
426783
1402
07:20
You have total control over how it's used.
199
428185
2489
07:22
The problem is that the revenue models
200
430674
1848
07:24
for most social media companies
201
432522
1724
07:26
rely on sharing or exploiting
users' data in some way.
202
434246
4031
07:30
It's sometimes said of Facebook that the users
203
438277
1833
07:32
aren't the customer, they're the product.
204
440110
2528
07:34
And so how do you get a company
205
442638
2714
07:37
to cede control of their main asset
206
445352
2558
07:39
back to the users?
207
447910
1249
07:41
It's possible, but I don't think it's something
208
449159
1701
07:42
that we're going to see change quickly.
209
450860
2320
07:45
So I think the other path
210
453180
1500
07:46
that we can go down that's
going to be more effective
211
454680
2288
07:48
is one of more science.
212
456968
1508
07:50
It's doing science that allowed us to develop
213
458476
2510
07:52
all these mechanisms for computing
214
460986
1750
07:54
this personal data in the first place.
215
462736
2052
07:56
And it's actually very similar research
216
464788
2106
07:58
that we'd have to do
217
466894
1438
08:00
if we want to develop mechanisms
218
468332
2386
08:02
that can say to a user,
219
470718
1421
08:04
"Here's the risk of that action you just took."
220
472139
2229
08:06
By liking that Facebook page,
221
474368
2080
08:08
or by sharing this piece of personal information,
222
476448
2535
08:10
you've now improved my ability
223
478983
1502
08:12
to predict whether or not you're using drugs
224
480485
2086
08:14
or whether or not you get
along well in the workplace.
225
482571
2862
08:17
And that, I think, can affect whether or not
226
485433
1848
08:19
people want to share something,
227
487281
1510
08:20
keep it private, or just keep it offline altogether.
228
488791
3239
08:24
We can also look at things like
229
492030
1563
08:25
allowing people to encrypt data that they upload,
230
493593
2728
08:28
so it's kind of invisible and worthless
231
496321
1855
08:30
to sites like Facebook
232
498176
1431
08:31
or third party services that access it,
233
499607
2629
08:34
but that select users who the person who posted it
234
502236
3247
08:37
want to see it have access to see it.
235
505483
2670
08:40
This is all super exciting research
236
508153
2166
08:42
from an intellectual perspective,
237
510319
1620
08:43
and so scientists are going to be willing to do it.
238
511939
1859
08:45
So that gives us an advantage over the law side.
239
513798
3610
08:49
One of the problems that people bring up
240
517408
1725
08:51
when I talk about this is, they say,
241
519133
1595
08:52
you know, if people start
keeping all this data private,
242
520728
2646
08:55
all those methods that you've been developing
243
523374
2113
08:57
to predict their traits are going to fail.
244
525487
2653
09:00
And I say, absolutely, and for me, that's success,
245
528140
3520
09:03
because as a scientist,
246
531660
1786
09:05
my goal is not to infer information about users,
247
533446
3688
09:09
it's to improve the way people interact online.
248
537134
2767
09:11
And sometimes that involves
inferring things about them,
249
539901
3218
09:15
but if users don't want me to use that data,
250
543119
3022
09:18
I think they should have the right to do that.
251
546141
2038
09:20
I want users to be informed and consenting
252
548179
2651
09:22
users of the tools that we develop.
253
550830
2112
09:24
And so I think encouraging this kind of science
254
552942
2952
09:27
and supporting researchers
255
555894
1346
09:29
who want to cede some of that control back to users
256
557240
3023
09:32
and away from the social media companies
257
560263
2311
09:34
means that going forward, as these tools evolve
258
562574
2671
09:37
and advance,
259
565245
1476
09:38
means that we're going to have an educated
260
566721
1414
09:40
and empowered user base,
261
568135
1694
09:41
and I think all of us can agree
262
569829
1100
09:42
that that's a pretty ideal way to go forward.
263
570929
2564
09:45
Thank you.
264
573493
2184
09:47
(Applause)
265
575677
3080

▲Back to top

ABOUT THE SPEAKER
Jennifer Golbeck - Computer scientist
As the director of the Human-Computer Interaction Lab at the University of Maryland, Jennifer Golbeck studies how people use social media -- and thinks about ways to improve their interactions.

Why you should listen

Jennifer Golbeck is an associate professor in the College of Information Studies at the University of Maryland, where she also moonlights in the department of computer science. Her work invariably focuses on how to enhance and improve the way that people interact with their own information online. "I approach this from a computer science perspective and my general research hits social networks, trust, web science, artificial intelligence, and human-computer interaction," she writes.

Author of the 2013 book, Analyzing the Social Web, Golbeck likes nothing more than to immerse herself in the inner workings of the Internet tools so many millions of people use daily, to understand the implications of our choices and actions. Recently, she has also been working to bring human-computer interaction ideas to the world of security and privacy systems.

More profile about the speaker
Jennifer Golbeck | Speaker | TED.com