16:26
TEDGlobal 2012

John Wilbanks: Let's pool our medical data

Filmed:

When you're getting medical treatment, or taking part in medical testing, privacy is important; strict laws limit what researchers can see and know about you. But what if your medical data could be used -- anonymously -- by anyone seeking to test a hypothesis? John Wilbanks wonders if the desire to protect our privacy is slowing research, and if opening up medical data could lead to a wave of health care innovation.

- Data Commons Advocate
Imagine the discoveries that could result from a giant pool of freely available health and genomic data. John Wilbanks is working to build it. Full bio

So I have bad news, I have good news,
00:15
and I have a task.
00:19
So the bad news is that we all get sick.
00:20
I get sick. You get sick.
00:23
And every one of us gets sick, and the question really is,
00:26
how sick do we get? Is it something that kills us?
00:28
Is it something that we survive?
00:31
Is it something that we can treat?
00:32
And we've gotten sick as long as we've been people.
00:34
And so we've always looked for reasons to explain why we get sick.
00:38
And for a long time, it was the gods, right?
00:41
The gods are angry with me, or the gods are testing me,
00:43
right? Or God, singular, more recently,
00:46
is punishing me or judging me.
00:49
And as long as we've looked for explanations,
00:51
we've wound up with something that gets closer and closer to science,
00:54
which is hypotheses as to why we get sick,
00:58
and as long as we've had hypotheses about why we get sick, we've tried to treat it as well.
01:00
So this is Avicenna. He wrote a book over a thousand years ago called "The Canon of Medicine,"
01:05
and the rules he laid out for testing medicines
01:09
are actually really similar to the rules we have today,
01:11
that the disease and the medicine must be the same strength,
01:13
the medicine needs to be pure, and in the end we need
01:16
to test it in people. And so if you put together these themes
01:18
of a narrative or a hypothesis in human testing,
01:22
right, you get some beautiful results,
01:26
even when we didn't have very good technologies.
01:29
This is a guy named Carlos Finlay. He had a hypothesis
01:30
that was way outside the box for his time, in the late 1800s.
01:33
He thought yellow fever was not transmitted by dirty clothing.
01:36
He thought it was transmitted by mosquitos.
01:39
And they laughed at him. For 20 years, they called this guy
01:41
"the mosquito man." But he ran an experiment in people,
01:44
right? He had this hypothesis, and he tested it in people.
01:47
So he got volunteers to go move to Cuba and live in tents
01:50
and be voluntarily infected with yellow fever.
01:55
So some of the people in some of the tents had dirty clothes
01:58
and some of the people were in tents that were full
02:01
of mosquitos that had been exposed to yellow fever.
02:02
And it definitively proved that it wasn't this magic dust
02:05
called fomites in your clothes that caused yellow fever.
02:08
But it wasn't until we tested it in people that we actually knew.
02:11
And this is what those people signed up for.
02:15
This is what it looked like to have yellow fever in Cuba
02:17
at that time. You suffered in a tent, in the heat, alone,
02:20
and you probably died.
02:24
But people volunteered for this.
02:27
And it's not just a cool example of a scientific design
02:30
of experiment in theory. They also did this beautiful thing.
02:34
They signed this document, and it's called an informed consent document.
02:36
And informed consent is an idea that we should be
02:40
very proud of as a society, right? It's something that
02:43
separates us from the Nazis at Nuremberg,
02:45
enforced medical experimentation. It's the idea
02:48
that agreement to join a study without understanding isn't agreement.
02:51
It's something that protects us from harm, from hucksters,
02:55
from people that would try to hoodwink us into a clinical
02:59
study that we don't understand, or that we don't agree to.
03:01
And so you put together the thread of narrative hypothesis,
03:05
experimentation in humans, and informed consent,
03:10
and you get what we call clinical study, and it's how we do
03:12
the vast majority of medical work. It doesn't really matter
03:15
if you're in the north, the south, the east, the west.
03:18
Clinical studies form the basis of how we investigate,
03:20
so if we're going to look at a new drug, right,
03:24
we test it in people, we draw blood, we do experiments,
03:26
and we gain consent for that study, to make sure
03:29
that we're not screwing people over as part of it.
03:31
But the world is changing around the clinical study,
03:34
which has been fairly well established for tens of years
03:38
if not 50 to 100 years.
03:41
So now we're able to gather data about our genomes,
03:43
but, as we saw earlier, our genomes aren't dispositive.
03:46
We're able to gather information about our environment.
03:49
And more importantly, we're able to gather information
03:52
about our choices, because it turns out that what we think of
03:54
as our health is more like the interaction of our bodies,
03:57
our genomes, our choices and our environment.
03:59
And the clinical methods that we've got aren't very good
04:03
at studying that because they are based on the idea
04:06
of person-to-person interaction. You interact
04:08
with your doctor and you get enrolled in the study.
04:10
So this is my grandfather. I actually never met him,
04:12
but he's holding my mom, and his genes are in me, right?
04:15
His choices ran through to me. He was a smoker,
04:19
like most people were. This is my son.
04:22
So my grandfather's genes go all the way through to him,
04:24
and my choices are going to affect his health.
04:28
The technology between these two pictures
04:30
cannot be more different, but the methodology
04:33
for clinical studies has not radically changed over that time period.
04:37
We just have better statistics.
04:41
The way we gain informed consent was formed in large part
04:43
after World War II, around the time that picture was taken.
04:47
That was 70 years ago, and the way we gain informed consent,
04:49
this tool that was created to protect us from harm,
04:53
now creates silos. So the data that we collect
04:56
for prostate cancer or for Alzheimer's trials
05:00
goes into silos where it can only be used
05:03
for prostate cancer or for Alzheimer's research.
05:05
Right? It can't be networked. It can't be integrated.
05:08
It cannot be used by people who aren't credentialed.
05:11
So a physicist can't get access to it without filing paperwork.
05:15
A computer scientist can't get access to it without filing paperwork.
05:18
Computer scientists aren't patient. They don't file paperwork.
05:21
And this is an accident. These are tools that we created
05:25
to protect us from harm, but what they're doing
05:29
is protecting us from innovation now.
05:32
And that wasn't the goal. It wasn't the point. Right?
05:35
It's a side effect, if you will, of a power we created
05:38
to take us for good.
05:41
And so if you think about it, the depressing thing is that
05:43
Facebook would never make a change to something
05:46
as important as an advertising algorithm
05:48
with a sample size as small as a Phase III clinical trial.
05:51
We cannot take the information from past trials
05:55
and put them together to form statistically significant samples.
05:59
And that sucks, right? So 45 percent of men develop
06:03
cancer. Thirty-eight percent of women develop cancer.
06:07
One in four men dies of cancer.
06:10
One in five women dies of cancer, at least in the United States.
06:12
And three out of the four drugs we give you
06:16
if you get cancer fail. And this is personal to me.
06:18
My sister is a cancer survivor.
06:21
My mother-in-law is a cancer survivor. Cancer sucks.
06:23
And when you have it, you don't have a lot of privacy
06:27
in the hospital. You're naked the vast majority of the time.
06:29
People you don't know come in and look at you and poke you and prod you,
06:33
and when I tell cancer survivors that this tool we created
06:36
to protect them is actually preventing their data from being used,
06:40
especially when only three to four percent of people
06:43
who have cancer ever even sign up for a clinical study,
06:45
their reaction is not, "Thank you, God, for protecting my privacy."
06:48
It's outrage
06:51
that we have this information and we can't use it.
06:54
And it's an accident.
06:56
So the cost in blood and treasure of this is enormous.
06:59
Two hundred and twenty-six billion a year is spent on cancer in the United States.
07:02
Fifteen hundred people a day die in the United States.
07:05
And it's getting worse.
07:08
So the good news is that some things have changed,
07:11
and the most important thing that's changed
07:14
is that we can now measure ourselves in ways
07:16
that used to be the dominion of the health system.
07:18
So a lot of people talk about it as digital exhaust.
07:21
I like to think of it as the dust that runs along behind my kid.
07:23
We can reach back and grab that dust,
07:26
and we can learn a lot about health from it, so if our choices
07:29
are part of our health, what we eat is a really important
07:31
aspect of our health. So you can do something very simple
07:34
and basic and take a picture of your food,
07:36
and if enough people do that, we can learn a lot about
07:38
how our food affects our health.
07:41
One interesting thing that came out of this — this is an app for iPhones called The Eatery —
07:43
is that we think our pizza is significantly healthier
07:47
than other people's pizza is. Okay? (Laughter)
07:50
And it seems like a trivial result, but this is the sort of research
07:53
that used to take the health system years
07:57
and hundreds of thousands of dollars to accomplish.
07:59
It was done in five months by a startup company of a couple of people.
08:01
I don't have any financial interest in it.
08:05
But more nontrivially, we can get our genotypes done,
08:08
and although our genotypes aren't dispositive, they give us clues.
08:10
So I could show you mine. It's just A's, T's, C's and G's.
08:13
This is the interpretation of it. As you can see,
08:16
I carry a 32 percent risk of prostate cancer,
08:18
22 percent risk of psoriasis and a 14 percent risk of Alzheimer's disease.
08:21
So that means, if you're a geneticist, you're freaking out,
08:25
going, "Oh my God, you told everyone you carry the ApoE E4 allele. What's wrong with you?"
08:28
Right? When I got these results, I started talking to doctors,
08:32
and they told me not to tell anyone, and my reaction is,
08:35
"Is that going to help anyone cure me when I get the disease?"
08:38
And no one could tell me yes.
08:41
And I live in a web world where, when you share things,
08:44
beautiful stuff happens, not bad stuff.
08:47
So I started putting this in my slide decks,
08:50
and I got even more obnoxious, and I went to my doctor,
08:51
and I said, "I'd like to actually get my bloodwork.
08:54
Please give me back my data." So this is my most recent bloodwork.
08:56
As you can see, I have high cholesterol.
08:59
I have particularly high bad cholesterol, and I have some
09:01
bad liver numbers, but those are because we had a dinner party with a lot of good wine
09:04
the night before we ran the test. (Laughter)
09:07
Right. But look at how non-computable this information is.
09:10
This is like the photograph of my granddad holding my mom
09:14
from a data perspective, and I had to go into the system
09:17
and get it out.
09:21
So the thing that I'm proposing we do here
09:23
is that we reach behind us and we grab the dust,
09:26
that we reach into our bodies and we grab the genotype,
09:28
and we reach into the medical system and we grab our records,
09:31
and we use it to build something together, which is a commons.
09:34
And there's been a lot of talk about commonses, right,
09:38
here, there, everywhere, right. A commons is nothing more
09:41
than a public good that we build out of private goods.
09:44
We do it voluntarily, and we do it through standardized
09:47
legal tools. We do it through standardized technologies.
09:49
Right. That's all a commons is. It's something that we build
09:52
together because we think it's important.
09:55
And a commons of data is something that's really unique,
09:58
because we make it from our own data. And although
10:01
a lot of people like privacy as their methodology of control
10:03
around data, and obsess around privacy, at least
10:06
some of us really like to share as a form of control,
10:08
and what's remarkable about digital commonses
10:11
is you don't need a big percentage if your sample size is big enough
10:13
to generate something massive and beautiful.
10:17
So not that many programmers write free software,
10:19
but we have the Apache web server.
10:22
Not that many people who read Wikipedia edit,
10:24
but it works. So as long as some people like to share
10:27
as their form of control, we can build a commons, as long as we can get the information out.
10:31
And in biology, the numbers are even better.
10:35
So Vanderbilt ran a study asking people, we'd like to take
10:37
your biosamples, your blood, and share them in a biobank,
10:40
and only five percent of the people opted out.
10:43
I'm from Tennessee. It's not the most science-positive state
10:45
in the United States of America. (Laughter)
10:48
But only five percent of the people wanted out.
10:51
So people like to share, if you give them the opportunity and the choice.
10:54
And the reason that I got obsessed with this, besides the obvious family aspects,
10:58
is that I spend a lot of time around mathematicians,
11:02
and mathematicians are drawn to places where there's a lot of data
11:06
because they can use it to tease signals out of noise.
11:09
And those correlations that they can tease out, they're not
11:11
necessarily causal agents, but math, in this day and age,
11:14
is like a giant set of power tools
11:18
that we're leaving on the floor, not plugged in in health,
11:21
while we use hand saws.
11:25
If we have a lot of shared genotypes, and a lot of shared
11:27
outcomes, and a lot of shared lifestyle choices,
11:31
and a lot of shared environmental information, we can start
11:34
to tease out the correlations between subtle variations
11:37
in people, the choices they make and the health that they create as a result of those choices,
11:40
and there's open-source infrastructure to do all of this.
11:45
Sage Bionetworks is a nonprofit that's built a giant math system
11:48
that's waiting for data, but there isn't any.
11:51
So that's what I do. I've actually started what we think is
11:55
the world's first fully digital, fully self-contributed,
11:59
unlimited in scope, global in participation, ethically approved
12:03
clinical research study where you contribute the data.
12:08
So if you reach behind yourself and you grab the dust,
12:12
if you reach into your body and grab your genome,
12:14
if you reach into the medical system and somehow extract your medical record,
12:17
you can actually go through an online informed consent process --
12:20
because the donation to the commons must be voluntary
12:23
and it must be informed -- and you can actually upload
12:26
your information and have it syndicated to the
12:28
mathematicians who will do this sort of big data research,
12:31
and the goal is to get 100,000 in the first year
12:34
and a million in the first five years so that we have
12:37
a statistically significant cohort that you can use to take
12:39
smaller sample sizes from traditional research
12:43
and map it against,
12:46
so that you can use it to tease out those subtle correlations
12:47
between the variations that make us unique
12:50
and the kinds of health that we need to move forward as a society.
12:53
And I've spent a lot of time around other commons.
12:57
I've been around the early web. I've been around
13:00
the early creative commons world, and there's four things
13:02
that all of these share, which is, they're all really simple.
13:05
And so if you were to go to the website and enroll in this study,
13:08
you're not going to see something complicated.
13:11
But it's not simplistic. These things are weak intentionally,
13:13
right, because you can always add power and control to a system,
13:18
but it's very difficult to remove those things if you put them in at the beginning,
13:21
and so being simple doesn't mean being simplistic,
13:25
and being weak doesn't mean weakness.
13:28
Those are strengths in the system.
13:30
And open doesn't mean that there's no money.
13:32
Closed systems, corporations, make a lot of money
13:35
on the open web, and they're one of the reasons why the open web lives
13:38
is that corporations have a vested interest in the openness
13:42
of the system.
13:44
And so all of these things are part of the clinical study that we've created,
13:47
so you can actually come in, all you have to be is 14 years old,
13:51
willing to sign a contract that says I'm not going to be a jerk,
13:54
basically, and you're in.
13:56
You can start analyzing the data.
13:59
You do have to solve a CAPTCHA as well. (Laughter)
14:00
And if you'd like to build corporate structures on top of it,
14:04
that's okay too. That's all in the consent,
14:08
so if you don't like those terms, you don't come in.
14:11
It's very much the design principles of a commons
14:14
that we're trying to bring to health data.
14:17
And the other thing about these systems is that it only takes
14:19
a small number of really unreasonable people working together
14:22
to create them. It didn't take that many people
14:26
to make Wikipedia Wikipedia, or to keep it Wikipedia.
14:29
And we're not supposed to be unreasonable in health,
14:32
and so I hate this word "patient."
14:34
I don't like being patient when systems are broken,
14:37
and health care is broken.
14:40
I'm not talking about the politics of health care, I'm talking about the way we scientifically approach health care.
14:42
So I don't want to be patient. And the task I'm giving to you
14:46
is to not be patient. So I'd like you to actually try,
14:50
when you go home, to get your data.
14:53
You'll be shocked and offended and, I would bet, outraged,
14:56
at how hard it is to get it.
14:58
But it's a challenge that I hope you'll take,
15:01
and maybe you'll share it. Maybe you won't.
15:04
If you don't have anyone in your family who's sick,
15:06
maybe you wouldn't be unreasonable. But if you do,
15:08
or if you've been sick, then maybe you would.
15:11
And we're going to be able to do an experiment in the next several months
15:13
that lets us know exactly how many unreasonable people are out there.
15:16
So this is the Athena Breast Health Network. It's a study
15:19
of 150,000 women in California, and they're going to
15:21
return all the data to the participants of the study
15:25
in a computable form, with one-clickability to load it into
15:28
the study that I've put together. So we'll know exactly
15:31
how many people are willing to be unreasonable.
15:33
So what I'd end [with] is,
15:36
the most beautiful thing I've learned since I quit my job
15:38
almost a year ago to do this, is that it really doesn't take
15:41
very many of us to achieve spectacular results.
15:45
You just have to be willing to be unreasonable,
15:49
and the risk we're running is not the risk those 14 men
15:51
who got yellow fever ran. Right?
15:54
It's to be naked, digitally, in public. So you know more
15:56
about me and my health than I know about you. It's asymmetric now.
15:58
And being naked and alone can be terrifying.
16:02
But to be naked in a group, voluntarily, can be quite beautiful.
16:06
And so it doesn't take all of us.
16:10
It just takes all of some of us. Thank you.
16:12
(Applause)
16:15
Translated by Joseph Geni
Reviewed by Morton Bast

▲Back to top

About the Speaker:

John Wilbanks - Data Commons Advocate
Imagine the discoveries that could result from a giant pool of freely available health and genomic data. John Wilbanks is working to build it.

Why you should listen

Performing a medical or genomic experiment on a human requires informed consent and careful boundaries around privacy. But what if the data that results, once scrubbed of identifying marks, was released into the wild? At WeConsent.us, John Wilbanks thinks through the ethical and procedural steps to create an open, massive, mine-able database of data about health and genomics from many sources. One step: the Portable Legal Consent for Common Genomics Research (PLC-CGR), an experimental bioethics protocol that would allow any test subject to say, "Yes, once this experiment is over, you can use my data, anonymously, to answer any other questions you can think of." Compiling piles of test results in one place, Wilbanks suggests, would turn genetic info into big data--giving researchers the potential to spot patterns that simply aren't viewable up close. 

A campaigner for the wide adoption of data sharing in science, Wilbanks is also a Senior Fellow with the Kauffman Foundation, a Research Fellow at Lybba and supported by Sage Bionetworks

In February 2013, the US government responded to a We the People petition spearheaded by Wilbanks and signed by 65,000 people, and announced a plan to open up taxpayer-funded research data and make it available for free.

More profile about the speaker
John Wilbanks | Speaker | TED.com