ABOUT THE SPEAKER
Riccardo Sabatini - Scientist, entrepreneur
Riccardo Sabatini applies his expertise in numerical modeling and data to projects ranging from material science to computational genomics and food market predictions.

Why you should listen

Data scientist Riccardo Sabatini harnesses numerical methods for a surprising variety of fields, from material science research to the study of food commodities (as a past director of the EU research project FoodCAST). His most recent research centers on computational genomics and how to crack the code of life.

In addition to his data research, Sabatini is deeply involved in education for entrepreneurs. He is the founder and co-director of the Quantum ESPRESSO Foundation, an advisor in several data-driven startups, and funder of The HUB Trieste, a social impact accelerator.

More profile about the speaker
Riccardo Sabatini | Speaker | TED.com
TED2016

Riccardo Sabatini: How to read the genome and build a human being

Filmed:
1,834,677 views

Secrets, disease and beauty are all written in the human genome, the complete set of genetic instructions needed to build a human being. Now, as scientist and entrepreneur Riccardo Sabatini shows us, we have the power to read this complex code, predicting things like height, eye color, age and even facial structure -- all from a vial of blood. And soon, Sabatini says, our new understanding of the genome will allow us to personalize treatments for diseases like cancer. We have the power to change life as we know it. How will we use it?
- Scientist, entrepreneur
Riccardo Sabatini applies his expertise in numerical modeling and data to projects ranging from material science to computational genomics and food market predictions. Full bio

Double-click the English transcript below to play the video.

00:12
For the next 16 minutes,
I'm going to take you on a journey
0
612
2762
00:15
that is probably
the biggest dream of humanity:
1
3398
3086
00:18
to understand the code of life.
2
6508
2015
00:21
So for me, everything started
many, many years ago
3
9072
2743
00:23
when I met the first 3D printer.
4
11839
2723
00:26
The concept was fascinating.
5
14586
1674
00:28
A 3D printer needs three elements:
6
16284
2022
00:30
a bit of information, some
raw material, some energy,
7
18330
4134
00:34
and it can produce any object
that was not there before.
8
22488
3334
00:38
I was doing physics,
I was coming back home
9
26517
2137
00:40
and I realized that I actually
always knew a 3D printer.
10
28678
3438
00:44
And everyone does.
11
32140
1336
00:45
It was my mom.
12
33500
1158
00:46
(Laughter)
13
34682
1001
00:47
My mom takes three elements:
14
35707
2414
00:50
a bit of information, which is between
my father and my mom in this case,
15
38145
3973
00:54
raw elements and energy
in the same media, that is food,
16
42142
4157
00:58
and after several months, produces me.
17
46323
2508
01:00
And I was not existent before.
18
48855
1812
01:02
So apart from the shock of my mom
discovering that she was a 3D printer,
19
50691
3762
01:06
I immediately got mesmerized
by that piece,
20
54477
4738
01:11
the first one, the information.
21
59239
1717
01:12
What amount of information does it take
22
60980
2251
01:15
to build and assemble a human?
23
63255
1936
01:17
Is it much? Is it little?
24
65215
1574
01:18
How many thumb drives can you fill?
25
66813
2180
01:21
Well, I was studying physics
at the beginning
26
69017
2624
01:23
and I took this approximation of a human
as a gigantic Lego piece.
27
71665
5597
01:29
So, imagine that the building
blocks are little atoms
28
77286
3785
01:33
and there is a hydrogen here,
a carbon here, a nitrogen here.
29
81095
4653
01:37
So in the first approximation,
30
85772
1571
01:39
if I can list the number of atoms
that compose a human being,
31
87367
4343
01:43
I can build it.
32
91734
1387
01:45
Now, you can run some numbers
33
93145
2029
01:47
and that happens to be
quite an astonishing number.
34
95198
3277
01:50
So the number of atoms,
35
98499
2757
01:53
the file that I will save in my thumb
drive to assemble a little baby,
36
101280
4755
01:58
will actually fill an entire Titanic
of thumb drives --
37
106059
4667
02:02
multiplied 2,000 times.
38
110750
2718
02:05
This is the miracle of life.
39
113957
3401
02:09
Every time you see from now on
a pregnant lady,
40
117382
2612
02:12
she's assembling the biggest
amount of information
41
120018
2856
02:14
that you will ever encounter.
42
122898
1556
02:16
Forget big data, forget
anything you heard of.
43
124478
2950
02:19
This is the biggest amount
of information that exists.
44
127452
2881
02:22
(Applause)
45
130357
3833
02:26
But nature, fortunately, is much smarter
than a young physicist,
46
134214
4644
02:30
and in four billion years, managed
to pack this information
47
138882
3576
02:34
in a small crystal we call DNA.
48
142482
2705
02:37
We met it for the first time in 1950
when Rosalind Franklin,
49
145605
4312
02:41
an amazing scientist, a woman,
50
149941
1556
02:43
took a picture of it.
51
151521
1389
02:44
But it took us more than 40 years
to finally poke inside a human cell,
52
152934
5188
02:50
take out this crystal,
53
158146
1602
02:51
unroll it, and read it for the first time.
54
159772
3080
02:55
The code comes out to be
a fairly simple alphabet,
55
163615
3241
02:58
four letters: A, T, C and G.
56
166880
3772
03:02
And to build a human,
you need three billion of them.
57
170676
3490
03:06
Three billion.
58
174933
1179
03:08
How many are three billion?
59
176136
1579
03:09
It doesn't really make
any sense as a number, right?
60
177739
2762
03:12
So I was thinking how
I could explain myself better
61
180525
4085
03:16
about how big and enormous this code is.
62
184634
3050
03:19
But there is -- I mean,
I'm going to have some help,
63
187708
3054
03:22
and the best person to help me
introduce the code
64
190786
3227
03:26
is actually the first man
to sequence it, Dr. Craig Venter.
65
194037
3522
03:29
So welcome onstage, Dr. Craig Venter.
66
197583
3390
03:32
(Applause)
67
200997
6931
03:39
Not the man in the flesh,
68
207952
2256
03:43
but for the first time in history,
69
211448
2345
03:45
this is the genome of a specific human,
70
213817
3462
03:49
printed page-by-page, letter-by-letter:
71
217303
3760
03:53
262,000 pages of information,
72
221087
3996
03:57
450 kilograms, shipped
from the United States to Canada
73
225107
4364
04:01
thanks to Bruno Bowden,
Lulu.com, a start-up, did everything.
74
229495
4843
04:06
It was an amazing feat.
75
234362
1463
04:07
But this is the visual perception
of what is the code of life.
76
235849
4297
04:12
And now, for the first time,
I can do something fun.
77
240170
2478
04:14
I can actually poke inside it and read.
78
242672
2547
04:17
So let me take an interesting
book ... like this one.
79
245243
4625
04:25
I have an annotation;
it's a fairly big book.
80
253077
2534
04:27
So just to let you see
what is the code of life.
81
255635
3727
04:32
Thousands and thousands and thousands
82
260566
3391
04:35
and millions of letters.
83
263981
2670
04:38
And they apparently make sense.
84
266675
2396
04:41
Let's get to a specific part.
85
269095
1757
04:43
Let me read it to you:
86
271571
1362
04:44
(Laughter)
87
272957
1021
04:46
"AAG, AAT, ATA."
88
274002
4006
04:50
To you it sounds like mute letters,
89
278965
2067
04:53
but this sequence gives
the color of the eyes to Craig.
90
281056
4041
04:57
I'll show you another part of the book.
91
285633
1932
04:59
This is actually a little
more complicated.
92
287589
2094
05:02
Chromosome 14, book 132:
93
290983
2647
05:05
(Laughter)
94
293654
2090
05:07
As you might expect.
95
295768
1277
05:09
(Laughter)
96
297069
3466
05:14
"ATT, CTT, GATT."
97
302857
4507
05:20
This human is lucky,
98
308329
1687
05:22
because if you miss just
two letters in this position --
99
310040
4517
05:26
two letters of our three billion --
100
314581
1877
05:28
he will be condemned
to a terrible disease:
101
316482
2019
05:30
cystic fibrosis.
102
318525
1440
05:31
We have no cure for it,
we don't know how to solve it,
103
319989
3413
05:35
and it's just two letters
of difference from what we are.
104
323426
3755
05:39
A wonderful book, a mighty book,
105
327585
2705
05:43
a mighty book that helped me understand
106
331115
1998
05:45
and show you something quite remarkable.
107
333137
2753
05:48
Every one of you -- what makes
me, me and you, you --
108
336480
4435
05:52
is just about five million of these,
109
340939
2954
05:55
half a book.
110
343917
1228
05:58
For the rest,
111
346015
1663
05:59
we are all absolutely identical.
112
347702
2562
06:03
Five hundred pages
is the miracle of life that you are.
113
351008
4018
06:07
The rest, we all share it.
114
355050
2531
06:09
So think about that again
when we think that we are different.
115
357605
2909
06:12
This is the amount that we share.
116
360538
2221
06:15
So now that I have your attention,
117
363441
3429
06:18
the next question is:
118
366894
1359
06:20
How do I read it?
119
368277
1151
06:21
How do I make sense out of it?
120
369452
1509
06:23
Well, for however good you can be
at assembling Swedish furniture,
121
371409
4240
06:27
this instruction manual
is nothing you can crack in your life.
122
375673
3563
06:31
(Laughter)
123
379260
1603
06:32
And so, in 2014, two famous TEDsters,
124
380887
3112
06:36
Peter Diamandis and Craig Venter himself,
125
384023
2540
06:38
decided to assemble a new company.
126
386587
1927
06:40
Human Longevity was born,
127
388538
1412
06:41
with one mission:
128
389974
1370
06:43
trying everything we can try
129
391368
1861
06:45
and learning everything
we can learn from these books,
130
393253
2759
06:48
with one target --
131
396036
1705
06:50
making real the dream
of personalized medicine,
132
398862
2801
06:53
understanding what things
should be done to have better health
133
401687
3767
06:57
and what are the secrets in these books.
134
405478
2283
07:00
An amazing team, 40 data scientists
and many, many more people,
135
408329
4250
07:04
a pleasure to work with.
136
412603
1350
07:05
The concept is actually very simple.
137
413977
2253
07:08
We're going to use a technology
called machine learning.
138
416254
3158
07:11
On one side, we have genomes --
thousands of them.
139
419436
4539
07:15
On the other side, we collected
the biggest database of human beings:
140
423999
3997
07:20
phenotypes, 3D scan, NMR --
everything you can think of.
141
428020
4296
07:24
Inside there, on these two opposite sides,
142
432340
2899
07:27
there is the secret of translation.
143
435263
2442
07:29
And in the middle, we build a machine.
144
437729
2472
07:32
We build a machine
and we train a machine --
145
440801
2385
07:35
well, not exactly one machine,
many, many machines --
146
443210
3210
07:38
to try to understand and translate
the genome in a phenotype.
147
446444
4544
07:43
What are those letters,
and what do they do?
148
451362
3340
07:46
It's an approach that can
be used for everything,
149
454726
2747
07:49
but using it in genomics
is particularly complicated.
150
457497
2993
07:52
Little by little we grew and we wanted
to build different challenges.
151
460514
3276
07:55
We started from the beginning,
from common traits.
152
463814
2732
07:58
Common traits are comfortable
because they are common,
153
466570
2603
08:01
everyone has them.
154
469197
1184
08:02
So we started to ask our questions:
155
470405
2494
08:04
Can we predict height?
156
472923
1380
08:06
Can we read the books
and predict your height?
157
474985
2177
08:09
Well, we actually can,
158
477186
1151
08:10
with five centimeters of precision.
159
478361
1793
08:12
BMI is fairly connected to your lifestyle,
160
480178
3135
08:15
but we still can, we get in the ballpark,
eight kilograms of precision.
161
483337
3864
08:19
Can we predict eye color?
162
487225
1231
08:20
Yeah, we can.
163
488480
1158
08:21
Eighty percent accuracy.
164
489662
1324
08:23
Can we predict skin color?
165
491466
1858
08:25
Yeah we can, 80 percent accuracy.
166
493348
2441
08:27
Can we predict age?
167
495813
1340
08:30
We can, because apparently,
the code changes during your life.
168
498121
3739
08:33
It gets shorter, you lose pieces,
it gets insertions.
169
501884
3282
08:37
We read the signals, and we make a model.
170
505190
2555
08:40
Now, an interesting challenge:
171
508438
1475
08:41
Can we predict a human face?
172
509937
1729
08:45
It's a little complicated,
173
513014
1278
08:46
because a human face is scattered
among millions of these letters.
174
514316
3191
08:49
And a human face is not
a very well-defined object.
175
517531
2629
08:52
So, we had to build an entire tier of it
176
520184
2051
08:54
to learn and teach
a machine what a face is,
177
522259
2710
08:56
and embed and compress it.
178
524993
2037
08:59
And if you're comfortable
with machine learning,
179
527054
2248
09:01
you understand what the challenge is here.
180
529326
2284
09:04
Now, after 15 years -- 15 years after
we read the first sequence --
181
532108
5991
09:10
this October, we started
to see some signals.
182
538123
2902
09:13
And it was a very emotional moment.
183
541049
2455
09:15
What you see here is a subject
coming in our lab.
184
543528
3745
09:19
This is a face for us.
185
547619
1928
09:21
So we take the real face of a subject,
we reduce the complexity,
186
549571
3631
09:25
because not everything is in your face --
187
553226
1970
09:27
lots of features and defects
and asymmetries come from your life.
188
555220
3786
09:31
We symmetrize the face,
and we run our algorithm.
189
559030
3469
09:35
The results that I show you right now,
190
563245
1898
09:37
this is the prediction we have
from the blood.
191
565167
3372
09:41
(Applause)
192
569596
1524
09:43
Wait a second.
193
571144
1435
09:44
In these seconds, your eyes are watching,
left and right, left and right,
194
572603
4692
09:49
and your brain wants
those pictures to be identical.
195
577319
3930
09:53
So I ask you to do
another exercise, to be honest.
196
581273
2446
09:55
Please search for the differences,
197
583743
2287
09:58
which are many.
198
586054
1361
09:59
The biggest amount of signal
comes from gender,
199
587439
2603
10:02
then there is age, BMI,
the ethnicity component of a human.
200
590066
5201
10:07
And scaling up over that signal
is much more complicated.
201
595291
3711
10:11
But what you see here,
even in the differences,
202
599026
3250
10:14
lets you understand
that we are in the right ballpark,
203
602300
3595
10:17
that we are getting closer.
204
605919
1348
10:19
And it's already giving you some emotions.
205
607291
2349
10:21
This is another subject
that comes in place,
206
609664
2703
10:24
and this is a prediction.
207
612391
1409
10:25
A little smaller face, we didn't get
the complete cranial structure,
208
613824
4596
10:30
but still, it's in the ballpark.
209
618444
2651
10:33
This is a subject that comes in our lab,
210
621634
2224
10:35
and this is the prediction.
211
623882
1443
10:38
So these people have never been seen
in the training of the machine.
212
626056
4676
10:42
These are the so-called "held-out" set.
213
630756
2837
10:45
But these are people that you will
probably never believe.
214
633617
3740
10:49
We're publishing everything
in a scientific publication,
215
637381
2676
10:52
you can read it.
216
640081
1151
10:53
But since we are onstage,
Chris challenged me.
217
641256
2344
10:55
I probably exposed myself
and tried to predict
218
643624
3626
10:59
someone that you might recognize.
219
647274
2831
11:02
So, in this vial of blood --
and believe me, you have no idea
220
650470
4425
11:06
what we had to do to have
this blood now, here --
221
654919
2880
11:09
in this vial of blood is the amount
of biological information
222
657823
3901
11:13
that we need to do a full genome sequence.
223
661748
2277
11:16
We just need this amount.
224
664049
2070
11:18
We ran this sequence,
and I'm going to do it with you.
225
666528
3205
11:21
And we start to layer up
all the understanding we have.
226
669757
3979
11:25
In the vial of blood,
we predicted he's a male.
227
673760
3350
11:29
And the subject is a male.
228
677134
1364
11:30
We predict that he's a meter and 76 cm.
229
678996
2438
11:33
The subject is a meter and 77 cm.
230
681458
2392
11:35
So, we predicted that he's 76;
the subject is 82.
231
683874
4110
11:40
We predict his age, 38.
232
688701
2632
11:43
The subject is 35.
233
691357
1904
11:45
We predict his eye color.
234
693851
2124
11:48
Too dark.
235
696824
1211
11:50
We predict his skin color.
236
698059
1555
11:52
We are almost there.
237
700026
1410
11:53
That's his face.
238
701899
1373
11:57
Now, the reveal moment:
239
705172
3269
12:00
the subject is this person.
240
708465
1770
12:02
(Laughter)
241
710259
1935
12:04
And I did it intentionally.
242
712218
2058
12:06
I am a very particular
and peculiar ethnicity.
243
714300
3692
12:10
Southern European, Italians --
they never fit in models.
244
718016
2950
12:12
And it's particular -- that ethnicity
is a complex corner case for our model.
245
720990
5130
12:18
But there is another point.
246
726144
1509
12:19
So, one of the things that we use
a lot to recognize people
247
727677
3477
12:23
will never be written in the genome.
248
731178
1722
12:24
It's our free will, it's how I look.
249
732924
2317
12:27
Not my haircut in this case,
but my beard cut.
250
735265
3229
12:30
So I'm going to show you, I'm going to,
in this case, transfer it --
251
738518
3553
12:34
and this is nothing more
than Photoshop, no modeling --
252
742095
2765
12:36
the beard on the subject.
253
744884
1713
12:38
And immediately, we get
much, much better in the feeling.
254
746621
3472
12:42
So, why do we do this?
255
750955
2709
12:47
We certainly don't do it
for predicting height
256
755938
5140
12:53
or taking a beautiful picture
out of your blood.
257
761102
2372
12:56
We do it because the same technology
and the same approach,
258
764390
4018
13:00
the machine learning of this code,
259
768432
2520
13:02
is helping us to understand how we work,
260
770976
3137
13:06
how your body works,
261
774137
1486
13:07
how your body ages,
262
775647
1665
13:09
how disease generates in your body,
263
777336
2769
13:12
how your cancer grows and develops,
264
780129
2972
13:15
how drugs work
265
783125
1783
13:16
and if they work on your body.
266
784932
2314
13:19
This is a huge challenge.
267
787713
1667
13:21
This is a challenge that we share
268
789894
1638
13:23
with thousands of other
researchers around the world.
269
791556
2579
13:26
It's called personalized medicine.
270
794159
2222
13:29
It's the ability to move
from a statistical approach
271
797125
3460
13:32
where you're a dot in the ocean,
272
800609
2032
13:34
to a personalized approach,
273
802665
1813
13:36
where we read all these books
274
804502
2185
13:38
and we get an understanding
of exactly how you are.
275
806711
2864
13:42
But it is a particularly
complicated challenge,
276
810260
3362
13:45
because of all these books, as of today,
277
813646
3998
13:49
we just know probably two percent:
278
817668
2642
13:53
four books of more than 175.
279
821027
3653
13:58
And this is not the topic of my talk,
280
826021
3206
14:02
because we will learn more.
281
830145
2598
14:05
There are the best minds
in the world on this topic.
282
833378
2669
14:09
The prediction will get better,
283
837048
1834
14:10
the model will get more precise.
284
838906
2253
14:13
And the more we learn,
285
841183
1858
14:15
the more we will
be confronted with decisions
286
843065
4830
14:19
that we never had to face before
287
847919
3021
14:22
about life,
288
850964
1435
14:24
about death,
289
852423
1674
14:26
about parenting.
290
854121
1603
14:32
So, we are touching the very
inner detail on how life works.
291
860626
4746
14:38
And it's a revolution
that cannot be confined
292
866118
3158
14:41
in the domain of science or technology.
293
869300
2659
14:44
This must be a global conversation.
294
872960
2244
14:47
We must start to think of the future
we're building as a humanity.
295
875798
5217
14:53
We need to interact with creatives,
with artists, with philosophers,
296
881039
4064
14:57
with politicians.
297
885127
1510
14:58
Everyone is involved,
298
886661
1158
14:59
because it's the future of our species.
299
887843
2825
15:03
Without fear, but with the understanding
300
891273
3968
15:07
that the decisions
that we make in the next year
301
895265
3871
15:11
will change the course of history forever.
302
899160
3789
15:15
Thank you.
303
903732
1160
15:16
(Applause)
304
904916
10159

▲Back to top

ABOUT THE SPEAKER
Riccardo Sabatini - Scientist, entrepreneur
Riccardo Sabatini applies his expertise in numerical modeling and data to projects ranging from material science to computational genomics and food market predictions.

Why you should listen

Data scientist Riccardo Sabatini harnesses numerical methods for a surprising variety of fields, from material science research to the study of food commodities (as a past director of the EU research project FoodCAST). His most recent research centers on computational genomics and how to crack the code of life.

In addition to his data research, Sabatini is deeply involved in education for entrepreneurs. He is the founder and co-director of the Quantum ESPRESSO Foundation, an advisor in several data-driven startups, and funder of The HUB Trieste, a social impact accelerator.

More profile about the speaker
Riccardo Sabatini | Speaker | TED.com