ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

TED2017

Joseph Redmon: How computers learn to recognize objects instantly

Joseph Redmon: Hoe een computer leert objecten in realtime te herkennen

Filmed: 2017-04-24

Readability: 4.5

2,471,805 views

Tien jaar geleden dachten onderzoekers dat het bijna onmogelijk zou zijn om een computer het verschil te leren tussen een kat en een hond. Tegenwoordig kunnen computer vision-systemen dat met meer dan 99% nauwkeurigheid. Hoe doen ze dat? Joseph Redmon werkt aan het YOLO-systeem ( (You Only Look Once), een opensource objectdetectieprogramma dat objecten in afbeeldingen en video's, van zebra's tot stopborden, razendsnel kan herkennen. In een bijzondere live demonstratie laat Redmon deze belangrijke ontwikkeling zien voor toepassingen zoals zelfrijdende auto's, robotica en zelfs het opsporen van kanker.

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time. Full bio

Double-click the English transcript below to play the video.

Tien jaar geleden dachten
'computer vision'-onderzoekers

00:12

TenTien yearsjaar agogeleden,

0

825

1151

00:14

computercomputer visionvisie researchersonderzoekers
thought that gettingkrijgen a computercomputer

1

2000

2776

dat een computer het verschil leren
tussen een kat en een hond

00:16

to tell the differenceverschil
betweentussen a catkat and a doghond

2

4800

2696

00:19

would be almostbijna impossibleonmogelijk,

3

7520

1976

bijna onmogelijk zou zijn,

00:21

even with the significantsignificant advancevan te voren
in the statestaat of artificialkunstmatig intelligenceintelligentie-.

4

9520

3696

al maakte kunstmatige intelligentie
nog zulke grote vorderingen.

00:25

Now we can do it at a levelniveau
greatergroter than 99 percentprocent accuracynauwkeurigheid.

5

13240

3560

Nu kan het met een nauwkeurigheid
van meer dan 99%.

00:29

This is calledriep imagebeeld classificationclassificatie --

6

17680

1856

Dit heet beeldclassificatie.

00:31

give it an imagebeeld,
put a labeletiket to that imagebeeld --

7

19560

3096

Je geeft de computer een afbeelding
en hij labelt die.

00:34

and computerscomputers know
thousandsduizenden of other categoriescategorieën as well.

8

22680

3040

Computers kennen nog
duizenden andere categorieën.

00:38

I'm a graduateafstuderen studentstudent
at the UniversityUniversiteit of WashingtonWashington,

9

26680

2896

Ik ben student aan de
Universiteit van Washington.

00:41

and I work on a projectproject calledriep DarknetDarknet,

10

29600

1896

Ik werk aan het project Darknet,
een neuraal netwerk

00:43

whichwelke is a neuralneurale networknetwerk frameworkkader

11

31520

1696

00:45

for trainingopleiding and testingtesting
computercomputer visionvisie modelsmodellen.

12

33240

2816

voor het trainen en testen
van computer vision-modellen.

00:48

So let's just see what DarknetDarknet thinksdenkt

13

36080

2976

Wat zou Darknet
van deze afbeelding vinden?

00:51

of this imagebeeld that we have.

14

39080

1760

00:54

When we runrennen our classifierclassificatie

15

42520

2336

Als we de 'classifier'
op deze afbeelding toepassen,

00:56

on this imagebeeld,

16

44880

1216

00:58

we see we don't just get
a predictionvoorspelling of doghond or catkat,

17

46120

2456

herkent hij niet alleen het dier,
maar zelfs het ras.

01:00

we actuallywerkelijk get
specificspecifiek breedras predictionsvoorspellingen.

18

48600

2336

01:02

That's the levelniveau
of granularitygranulariteit we have now.

19

50960

2176

Zo verfijnd is de classificatie.

01:05

And it's correctcorrect.

20

53160

1616

Mijn hond is inderdaad een malamute.

01:06

My doghond is in factfeit a malamuteMalamute.

21

54800

1840

01:09

So we'vewij hebben madegemaakt amazingverbazingwekkend stridesvooruitgang
in imagebeeld classificationclassificatie,

22

57040

4336

Er is veel vooruitgang geboekt
bij beeldclassificatie.

01:13

but what happensgebeurt
when we runrennen our classifierclassificatie

23

61400

2000

Wat gebeurt er als we de classifier
op zo'n afbeelding toepassen?

01:15

on an imagebeeld that lookslooks like this?

24

63424

1960

01:19

Well ...

25

67080

1200

01:24

We see that the classifierclassificatie comeskomt back
with a prettymooi similarsoortgelijk predictionvoorspelling.

26

72640

3896

De classifier geeft
bijna dezelfde voorspelling.

01:28

And it's correctcorrect,
there is a malamuteMalamute in the imagebeeld,

27

76560

3096

Er is inderdaad een malamute te zien.

Maar dat label vertelt ons niet
wat er gebeurt in de afbeelding.

01:31

but just givengegeven this labeletiket,
we don't actuallywerkelijk know that much

28

79680

3696

01:35

about what's going on in the imagebeeld.

29

83400

1667

01:37

We need something more powerfulkrachtig.

30

85091

1560

We hebben iets krachtigers nodig.

01:39

I work on a problemprobleem
calledriep objectvoorwerp detectionopsporing,

31

87240

2616

Ik werk aan het probleem
van objectdetectie.

01:41

where we look at an imagebeeld
and try to find all of the objectsvoorwerpen,

32

89880

2936

Daarbij proberen we alle objecten
op een afbeelding te vinden.

01:44

put boundingselectiekader boxesdozen around them

33

92840

1456

We plaatsen er een blok omheen
en labelen de objecten.

01:46

and say what those objectsvoorwerpen are.

34

94320

1520

01:48

So here'shier is what happensgebeurt
when we runrennen a detectordetector on this imagebeeld.

35

96400

3280

Dit is wat er gebeurt als we een detector
op deze afbeelding toepassen.

01:53

Now, with this kindsoort of resultresultaat,

36

101240

2256

Met zo'n resultaat kunnen we meer doen
met onze computer vision-algoritmen.

01:55

we can do a lot more
with our computercomputer visionvisie algorithmsalgoritmen.

37

103520

2696

Hij weet dat er een kat en een hond zijn.

01:58

We see that it knowsweet
that there's a catkat and a doghond.

38

106240

2976

02:01

It knowsweet theirhun relativefamilielid locationslocaties,

39

109240

2256

Hij weet waar ze zijn
en hoe groot ze zijn.

02:03

theirhun sizegrootte.

40

111520

1216

02:04

It maymei even know some extraextra informationinformatie.

41

112760

1936

Hij ziet ook extra informatie,
zoals het boek op de achtergrond.

02:06

There's a bookboek sittingzittend in the backgroundachtergrond.

42

114720

1960

02:09

And if you want to buildbouwen a systemsysteem
on toptop of computercomputer visionvisie,

43

117280

3256

Als je een systeem wilt bouwen
op basis van computer vision,

02:12

say a self-drivingzelf rijden vehiclevoertuig
or a roboticrobot systemsysteem,

44

120560

3456

zoals een zelfrijdend voertuig
of een robotsysteem,

02:16

this is the kindsoort
of informationinformatie that you want.

45

124040

2456

dan wil je zulke informatie.

02:18

You want something so that
you can interactop elkaar inwerken with the physicalfysiek worldwereld-.

46

126520

3239

Je wilt contact maken
met de fysieke wereld.

02:22

Now, when I startedbegonnen workingwerkend
on objectvoorwerp detectionopsporing,

47

130759

2257

Toen ik begon met werken
aan objectdetectie,

02:25

it tooknam 20 secondsseconden
to processwerkwijze a singlesingle imagebeeld.

48

133040

3296

duurde het verwerken
van één afbeelding 20 seconden.

02:28

And to get a feel for why
speedsnelheid is so importantbelangrijk in this domaindomein,

49

136360

3880

Ik zal laten zien waarom snelheid
zo belangrijk is op dit gebied.

02:33

here'shier is an examplevoorbeeld of an objectvoorwerp detectordetector

50

141120

2536

Hier is een objectdetector
die er twee seconden over doet

02:35

that takes two secondsseconden
to processwerkwijze an imagebeeld.

51

143680

2416

om een afbeelding te verwerken.

02:38

So this is 10 timestijden fastersneller

52

146120

2616

Dat is tien keer sneller dan de detector
die er 20 seconden over deed.

02:40

than the 20-seconds-per-image-seconden-per-beeld detectordetector,

53

148760

3536

02:44

and you can see that by the time
it makesmerken predictionsvoorspellingen,

54

152320

2656

Zodra de voorspellingen zijn gemaakt
is de situatie al veranderd.

02:47

the entiregeheel statestaat of the worldwereld- has changedveranderd,

55

155000

2040

02:49

and this wouldn'tzou het niet be very usefulnuttig

56

157880

2416

Dat is niet erg nuttig
voor een toepassing.

02:52

for an applicationtoepassing.

57

160320

1416

02:53

If we speedsnelheid this up
by anothereen ander factorfactor of 10,

58

161760

2496

We versnellen dit nog eens tien keer.

02:56

this is a detectordetector runninglopend
at fivevijf framesframes perper secondtweede.

59

164280

2816

Dit is een detector
die vijf beelden per seconden verwerkt.

02:59

This is a lot better,

60

167120

1536

Dat is een stuk beter.

03:00

but for examplevoorbeeld,

61

168680

1976

Maar niet als er veel beweging is.

03:02

if there's any significantsignificant movementbeweging,

62

170680

2296

03:05

I wouldn'tzou het niet want a systemsysteem
like this drivinghet rijden my carauto.

63

173000

2560

Ik zou niet willen dat zo'n systeem
mijn auto bestuurt.

03:09

This is our detectionopsporing systemsysteem
runninglopend in realecht time on my laptoplaptop.

64

177120

3240

Dit is ons detectiesysteem
dat in realtime op mijn laptop draait.

03:13

So it smoothlyglad trackssporen me
as I moveverhuizing around the framemontuur,

65

181000

3136

Hij volgt me terwijl ik rondloop.

03:16

and it's robustrobuust to a widebreed varietyverscheidenheid
of changesveranderingen in sizegrootte,

66

184160

3720

Hij verwerkt veranderingen
in grootte en houding.

03:21

posepose,

67

189440

1200

03:23

forwardvooruit, backwardachteruit.

68

191280

1856

Naar voren, naar achteren.

03:25

This is great.

69

193160

1216

Dit is wat we nodig hebben

03:26

This is what we really need

70

194400

1736

voor een systeem
op basis van computer vision.

03:28

if we're going to buildbouwen systemssystemen
on toptop of computercomputer visionvisie.

71

196160

2896

03:31

(ApplauseApplaus)

72

199080

4000

03:36

So in just a fewweinig yearsjaar,

73

204280

2176

In slechts een paar jaar

03:38

we'vewij hebben goneweg from 20 secondsseconden perper imagebeeld

74

206480

2656

zijn we van 20 seconden per beeld

03:41

to 20 millisecondsmilliseconden perper imagebeeld,
a thousandduizend timestijden fastersneller.

75

209160

3536

naar 20 milliseconden per beeld gegaan.

03:44

How did we get there?

76

212720

1416

Hoe hebben we dat gedaan?

03:46

Well, in the pastverleden,
objectvoorwerp detectionopsporing systemssystemen

77

214160

3016

Vroeger deelden detectiesystemen
zo'n afbeelding op in een aantal gebieden.

03:49

would take an imagebeeld like this

78

217200

1936

03:51

and splitspleet it into a bunchbos of regionsRegio's

79

219160

2456

03:53

and then runrennen a classifierclassificatie
on eachelk of these regionsRegio's,

80

221640

3256

Een classifier werd toegepast
op elk gebied.

03:56

and highhoog scoresscores for that classifierclassificatie

81

224920

2536

Als de classifier hoog scoorde,
was dat een detectie.

03:59

would be consideredbeschouwd
detectionsdetecties in the imagebeeld.

82

227480

3136

04:02

But this involvedbetrokken runninglopend a classifierclassificatie
thousandsduizenden of timestijden over an imagebeeld,

83

230640

4056

De classifier werd dus
duizenden keren toegepast.

04:06

thousandsduizenden of neuralneurale networknetwerk evaluationsevaluaties
to produceproduceren detectionopsporing.

84

234720

2920

Duizenden evaluaties
van het netwerk om een detectie te doen.

04:11

InsteadIn plaats daarvan, we trainedgetraind a singlesingle networknetwerk
to do all of detectionopsporing for us.

85

239240

4536

Wij hebben één netwerk getraind
om alle detecties uit te voeren.

04:15

It producesproduceert all of the boundingselectiekader boxesdozen
and classklasse probabilitieswaarschijnlijkheden simultaneouslygelijktijdig.

86

243800

4280

Hij produceert alle grensvlakken
en voorspellingen tegelijkertijd.

04:20

With our systemsysteem, insteadin plaats daarvan of looking
at an imagebeeld thousandsduizenden of timestijden

87

248680

3496

Met ons systeem kijken we
niet duizend keer naar een afbeelding

04:24

to produceproduceren detectionopsporing,

88

252200

1456

maar slechts één keer,
'you only look once'.

04:25

you only look onceeen keer,

89

253680

1256

04:26

and that's why we call it
the YOLOYOLO methodmethode of objectvoorwerp detectionopsporing.

90

254960

2920

Daarom noemen we het
de YOLO-methode.

04:31

So with this speedsnelheid,
we're not just limitedbeperkt to imagesafbeeldingen;

91

259360

3976

Met deze snelheid
zijn we niet beperkt tot afbeeldingen.

04:35

we can processwerkwijze videovideo- in realecht time.

92

263360

2416

We kunnen video's in realtime verwerken.

04:37

And now, insteadin plaats daarvan of just seeingziend
that catkat and doghond,

93

265800

3096

We zien niet alleen de kat en de hond.

04:40

we can see them moveverhuizing around
and interactop elkaar inwerken with eachelk other.

94

268920

2960

We zien ze rondbewegen
en op elkaar reageren.

04:46

This is a detectordetector that we trainedgetraind

95

274560

2056

Deze detector hebben we getraind
met 80 verschillende categorieën

04:48

on 80 differentverschillend classesklassen

96

276640

4376

04:53

in Microsoft'sMicrosofts COCOCOCO datasetDataset.

97

281040

3256

in de COCO-dataset van Microsoft.

04:56

It has all sortssoorten of things
like spoonlepel and forkvork, bowlkom,

98

284320

3336

Die bevat allerlei gewone objecten
zoals lepels, vorken en kommen.

04:59

commongemeenschappelijk objectsvoorwerpen like that.

99

287680

1800

05:02

It has a varietyverscheidenheid of more exoticexotische things:

100

290360

3096

Hij heeft ook exotischere dingen,
zoals auto's, zebra's en giraffes.

05:05

animalsdieren, carsauto's, zebrasZebra 's, giraffesgiraffen.

101

293480

3256

05:08

And now we're going to do something funpret.

102

296760

1936

Nu gaan we iets leuks doen.

05:10

We're just going to go
out into the audiencepubliek

103

298720

2096

We gaan kijken wat we
in het publiek kunnen detecteren.

05:12

and see what kindsoort of things we can detectopsporen.

104

300840

2016

Wil iemand een knuffelbeest?

05:14

Does anyoneiedereen want a stuffedgevuld animaldier?

105

302880

1620

05:18

There are some teddyTeddy bearsbears out there.

106

306000

1762

Hier zijn wat teddyberen.

05:22

And we can turnbeurt down
our thresholddrempel for detectionopsporing a little bitbeetje,

107

310040

4536

Ik verlaag de detectiegevoeligheid
zodat hij meer mensen detecteert.

05:26

so we can find more of you guys
out in the audiencepubliek.

108

314600

3400

05:31

Let's see if we can get these stop signstekenen.

109

319560

2336

Kijken of hij deze stopborden vindt.

05:33

We find some backpacksRugzakken.

110

321920

1880

Hij vindt wat rugzakken.

05:37

Let's just zoomzoom in a little bitbeetje.

111

325880

1840

Even inzoomen.

05:42

And this is great.

112

330320

1256

De computer verwerkt alles in realtime.

05:43

And all of the processingverwerken
is happeninggebeurtenis in realecht time

113

331600

3176

05:46

on the laptoplaptop.

114

334800

1200

05:49

And it's importantbelangrijk to rememberonthouden

115

337080

1456

Dit is een algemeen detectiesysteem.

05:50

that this is a generalalgemeen purposedoel
objectvoorwerp detectionopsporing systemsysteem,

116

338560

3216

05:53

so we can traintrein this for any imagebeeld domaindomein.

117

341800

5000

We kunnen hem trainen
voor elk soort beeld.

06:00

The samedezelfde codecode that we use

118

348320

2536

Dezelfde code die we gebruiken

06:02

to find stop signstekenen or pedestriansvoetgangers,

119

350880

2456

voor het vinden van stopborden,
voetgangers en fietsers

06:05

bicyclesfietsen in a self-drivingzelf rijden vehiclevoertuig,

120

353360

1976

in een zelfrijdend voertuig,

06:07

can be used to find cancerkanker cellscellen

121

355360

2856

kan ook gebruikt worden
om kankercellen te vinden

06:10

in a tissuezakdoek biopsybiopsie.

122

358240

3016

in een weefselbiopsie.

06:13

And there are researchersonderzoekers around the globewereldbol
alreadynu al usinggebruik makend van this technologytechnologie

123

361280

4040

Onderzoekers over de hele wereld
gebruiken deze technologie al

06:18

for advancesvoorschotten in things
like medicinegeneeskunde, roboticsRobotica.

124

366240

3416

voor de vooruitgang
in de geneeskunde en robotica.

06:21

This morningochtend-, I readlezen a paperpapier

125

369680

1376

Vanochtend las ik in de krant
over een census van de dieren

06:23

where they were takingnemen a censusvolkstelling
of animalsdieren in NairobiNairobi NationalNationale ParkPark

126

371080

4576

in het nationaal park van Nairobi.

06:27

with YOLOYOLO as partdeel
of this detectionopsporing systemsysteem.

127

375680

3136

Ze gebruikten YOLO
bij het detectiesysteem.

06:30

And that's because DarknetDarknet is openOpen sourcebron

128

378840

3096

Dat kan omdat Darknet opensource is.

06:33

and in the publicopenbaar domaindomein,
freegratis for anyoneiedereen to use.

129

381960

2520

Iedereen kan het gratis gebruiken.

06:37

(ApplauseApplaus)

130

385600

5696

06:43

But we wanted to make detectionopsporing
even more accessiblebeschikbaar and usablebruikbaar,

131

391320

4936

Wij wilden detectie
nog toegankelijker maken.

06:48

so throughdoor a combinationcombinatie
of modelmodel- optimizationoptimalisatie,

132

396280

4056

Met optimalisatie van het model

06:52

networknetwerk binarizationBinarization and approximationonderlinge aanpassing,

133

400360

2296

en binarisatie en approximatie
van het netwerk

06:54

we actuallywerkelijk have objectvoorwerp detectionopsporing
runninglopend on a phonetelefoon.

134

402680

3920

konden we objectdetectie uitvoeren
via een telefoon.

07:04

(ApplauseApplaus)

135

412800

5320

07:10

And I'm really excitedopgewonden because
now we have a prettymooi powerfulkrachtig solutionoplossing

136

418960

5056

Ik ben er enthousiast over

want nu hebben we een krachtige oplossing
voor dit computer vision-probleem.

07:16

to this low-levelLow-level computercomputer visionvisie problemprobleem,

137

424040

2296

07:18

and anyoneiedereen can take it
and buildbouwen something with it.

138

426360

3856

Iedereen kan er iets mee maken.

07:22

So now the restrust uit is up to all of you

139

430240

3176

De rest is aan iedereen
met toegang tot deze software.

07:25

and people around the worldwereld-
with accesstoegang to this softwaresoftware,

140

433440

2936

Ik kan niet wachten om te zien
wat men met deze technologie maakt.

07:28

and I can't wait to see what people
will buildbouwen with this technologytechnologie.

141

436400

3656

07:32

Thank you.

142

440080

1216

Bedankt.

07:33

(ApplauseApplaus)

143

441320

3440

Translated by Anne van Gulick
Reviewed by Rik Delaet

ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

Joseph Redmon: Hoe een computer leert objecten in realtime te herkennen | TED Talk | TED.com