ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

TED2017

Joseph Redmon: How computers learn to recognize objects instantly

Joseph Redmon: Comment enseigner à un ordinateur la reconnaissance instantanée des objets

Filmed: 2017-04-24

Readability: 4.5

2,471,805 views

Il y a dix ans de cela, les chercheurs en vision artificielle pensaient que demander à un ordinateur de différencier un chat et un chien relevait presque de l'impossible, et ce malgré les nets progrès réalisés en intelligence artificielle. Nous y parvenons dorénavant avec une fiabilité supérieure à 99 %. Comment ? Joseph Redmon développe le système YOLO (You Only Look Once), une méthode open source de détection d'objets capable d'identifier des objets sur des images et des vidéos, qu'il s'agisse d'un panneau stop ou d'un zèbre, à la vitesse de la lumière. Avec cette brillante démonstration en direct, Redmon met en lumière cette avancée d'importance pour des applications telles les voitures autoguidées, la robotique et même la détection de cancers.

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time. Full bio

Double-click the English transcript below to play the video.

00:12

TenDix yearsannées agodepuis,

0

825

1151

Il y a dix ans de cela,

00:14

computerordinateur visionvision researchersdes chercheurs
thought that gettingobtenir a computerordinateur

1

2000

2776

les chercheurs
en vision artificielle pensaient

00:16

to tell the differencedifférence
betweenentre a catchat and a dogchien

2

4800

2696

que faire différencier à l'ordinateur
un chat d'un chien

00:19

would be almostpresque impossibleimpossible,

3

7520

1976

relevait presque de l'impossible,

00:21

even with the significantimportant advanceavance
in the stateEtat of artificialartificiel intelligenceintelligence.

4

9520

3696

et ce malgré les nets progrès
réalisés en intelligence artificielle.

00:25

Now we can do it at a levelniveau
greaterplus grand than 99 percentpour cent accuracyprécision.

5

13240

3560

Nous y parvenons dorénavant
avec une fiabilité supérieure à 99 %.

00:29

This is calledappelé imageimage classificationclassement --

6

17680

1856

C'est appelé
la classification d'images.

00:31

give it an imageimage,
put a labelétiquette to that imageimage --

7

19560

3096

On prend une image, qu'on étiquette...

00:34

and computersdes ordinateurs know
thousandsmilliers of other categoriescategories as well.

8

22680

3040

L'ordinateur connaît également
des milliers d'autres catégories.

00:38

I'm a graduatediplômé studentétudiant
at the UniversityUniversité of WashingtonWashington,

9

26680

2896

Je suis diplômé et j'étudie
à l'université de Washington,

00:41

and I work on a projectprojet calledappelé DarknetDarknet,

10

29600

1896

et je travaille sur le projet Darknet,

00:43

whichlequel is a neuralneural networkréseau frameworkcadre

11

31520

1696

une structure en réseau neuronal

00:45

for trainingentraînement and testingessai
computerordinateur visionvision modelsdes modèles.

12

33240

2816

pour entraîner et tester
les modèles numériques de vision.

00:48

So let's just see what DarknetDarknet thinkspense

13

36080

2976

Voyons ce que Darknet pense

00:51

of this imageimage that we have.

14

39080

1760

de cette image.

00:54

When we runcourir our classifierclassificateur

15

42520

2336

Si nous analysons cette image

00:56

on this imageimage,

16

44880

1216

à l'aide du classifieur

00:58

we see we don't just get
a predictionprédiction of dogchien or catchat,

17

46120

2456

nous obtenons
la prédiction « chat » ou « chien »

01:00

we actuallyréellement get
specificspécifique breedrace predictionsprédictions.

18

48600

2336

mais aussi la race spécifique.

01:02

That's the levelniveau
of granularitygranularité we have now.

19

50960

2176

Voici le niveau de granularité actuel.

01:05

And it's correctcorrect.

20

53160

1616

Et c'est exact.

01:06

My dogchien is in factfait a malamuteMalamute.

21

54800

1840

Mon chien est un malamute.

01:09

So we'venous avons madefabriqué amazingincroyable stridespas de géant
in imageimage classificationclassement,

22

57040

4336

Nos avancées en classification d'images
sont donc incroyables.

01:13

but what happensarrive
when we runcourir our classifierclassificateur

23

61400

2000

Mais que se passe-t-il
si notre classifieur

01:15

on an imageimage that looksregards like this?

24

63424

1960

analyse une image de ce genre ?

01:19

Well ...

25

67080

1200

Eh bien...

01:24

We see that the classifierclassificateur comesvient back
with a prettyjoli similarsimilaire predictionprédiction.

26

72640

3896

Il fournit une prédiction
relativement similaire.

01:28

And it's correctcorrect,
there is a malamuteMalamute in the imageimage,

27

76560

3096

Et c'est exact,
l'image contient un malamute.

01:31

but just givendonné this labelétiquette,
we don't actuallyréellement know that much

28

79680

3696

Mais cette information
n'indique pas grand-chose

01:35

about what's going on in the imageimage.

29

83400

1667

sur ce qui se passe dans l'image.

01:37

We need something more powerfulpuissant.

30

85091

1560

Il faudrait aller plus loin.

01:39

I work on a problemproblème
calledappelé objectobjet detectiondétection,

31

87240

2616

Je travaille sur la détection d'objets.

01:41

where we look at an imageimage
and try to find all of the objectsobjets,

32

89880

2936

Il s'agit de trouver tous les objets
présents sur une image,

01:44

put boundingenglobante boxesdes boites around them

33

92840

1456

de les circonscrire

01:46

and say what those objectsobjets are.

34

94320

1520

et de les identifier.

01:48

So here'svoici what happensarrive
when we runcourir a detectordétecteur on this imageimage.

35

96400

3280

Voyons ce qui se passe quand
le détecteur analyse cette image.

01:53

Now, with this kindgentil of resultrésultat,

36

101240

2256

Ce genre de résultats démultiplient

01:55

we can do a lot more
with our computerordinateur visionvision algorithmsalgorithmes.

37

103520

2696

les capacités de nos algorithmes
de vision artificielle.

01:58

We see that it knowssait
that there's a catchat and a dogchien.

38

106240

2976

On voit qu'il a reconnu
la présence du chat et du chien.

02:01

It knowssait theirleur relativerelatif locationsEmplacements,

39

109240

2256

Il sait où chacun se situe

02:03

theirleur sizeTaille.

40

111520

1216

et leur taille.

02:04

It maymai even know some extrasupplémentaire informationinformation.

41

112760

1936

Il peut même savoir d'autres choses.

02:06

There's a booklivre sittingséance in the backgroundContexte.

42

114720

1960

Un livre se trouve au fond.

02:09

And if you want to buildconstruire a systemsystème
on topHaut of computerordinateur visionvision,

43

117280

3256

Si l'on veut concevoir un système
basé sur la vision artificielle,

02:12

say a self-drivingSelf-driving vehiclevéhicule
or a roboticrobotique systemsystème,

44

120560

3456

tel un véhicule autoguidé
ou un système robotisé,

02:16

this is the kindgentil
of informationinformation that you want.

45

124040

2456

on recherche ce genre d'informations.

02:18

You want something so that
you can interactinteragir with the physicalphysique worldmonde.

46

126520

3239

On cherche à pourvoir interagir
avec le monde physique.

02:22

Now, when I startedcommencé workingtravail
on objectobjet detectiondétection,

47

130759

2257

Quand j'ai débuté
dans la détection d'objets,

02:25

it tooka pris 20 secondssecondes
to processprocessus a singleunique imageimage.

48

133040

3296

il fallait 20 secondes
pour analyser une seule image.

02:28

And to get a feel for why
speedla vitesse is so importantimportant in this domaindomaine,

49

136360

3880

Pour que vous saisissiez l'importance
de la rapidité dans ce domaine,

02:33

here'svoici an exampleExemple of an objectobjet detectordétecteur

50

141120

2536

voici un exemple de détecteur d'objets

02:35

that takes two secondssecondes
to processprocessus an imageimage.

51

143680

2416

qui analyse une image en deux secondes

02:38

So this is 10 timesfois fasterPlus vite

52

146120

2616

soit dix fois plus rapidement

02:40

than the 20-seconds-per-image-secondes-par-image detectordétecteur,

53

148760

3536

que celui à 20 secondes par image.

02:44

and you can see that by the time
it makesfait du predictionsprédictions,

54

152320

2656

On voit que le temps
qu'il fasse des prédictions,

02:47

the entiretout stateEtat of the worldmonde has changedmodifié,

55

155000

2040

l'état des choses a changé,

02:49

and this wouldn'tne serait pas be very usefulutile

56

157880

2416

et son application

02:52

for an applicationapplication.

57

160320

1416

serait peu utile.

02:53

If we speedla vitesse this up
by anotherun autre factorfacteur of 10,

58

161760

2496

En l'accélérant encore
d'un facteur de dix,

02:56

this is a detectordétecteur runningfonctionnement
at fivecinq framescadres perpar secondseconde.

59

164280

2816

le détecteur analyse
cinq images par seconde.

02:59

This is a lot better,

60

167120

1536

C'est bien mieux.

03:00

but for exampleExemple,

61

168680

1976

Par contre,

03:02

if there's any significantimportant movementmouvement,

62

170680

2296

si un mouvement important se produit,

03:05

I wouldn'tne serait pas want a systemsystème
like this drivingau volant my carvoiture.

63

173000

2560

je ne veux pas que ce système
conduise ma voiture.

03:09

This is our detectiondétection systemsystème
runningfonctionnement in realréal time on my laptopportable.

64

177120

3240

Voici notre système de détection
en temps réel sur mon PC portable.

03:13

So it smoothlydoucement tracksdes pistes me
as I movebouge toi around the frameCadre,

65

181000

3136

Il suit aisément
mes déplacements dans le cadre,

03:16

and it's robustrobuste to a widelarge varietyvariété
of changeschangements in sizeTaille,

66

184160

3720

il supporte une grande variété
de changements de taille,

03:21

posepose,

67

189440

1200

de postures,

03:23

forwardvers l'avant, backwardvers l’arrière.

68

191280

1856

si j'avance, recule.

03:25

This is great.

69

193160

1216

C'est formidable.

03:26

This is what we really need

70

194400

1736

C'est ce qu'il nous faut réellement

03:28

if we're going to buildconstruire systemssystèmes
on topHaut of computerordinateur visionvision.

71

196160

2896

pour concevoir des systèmes
basés sur la vision artificielle.

03:31

(ApplauseApplaudissements)

72

199080

4000

(Applaudissements)

03:36

So in just a fewpeu yearsannées,

73

204280

2176

En quelques années seulement,

03:38

we'venous avons gonedisparu from 20 secondssecondes perpar imageimage

74

206480

2656

on est passés de 20 secondes par image

03:41

to 20 millisecondsmillisecondes perpar imageimage,
a thousandmille timesfois fasterPlus vite.

75

209160

3536

à 20 millisecondes par image.
Mille fois plus vite.

03:44

How did we get there?

76

212720

1416

Comment avons-nous fait ?

03:46

Well, in the pastpassé,
objectobjet detectiondétection systemssystèmes

77

214160

3016

Avant, les systèmes de détection d'objets

03:49

would take an imageimage like this

78

217200

1936

prenaient une image similaire

03:51

and splitDivisé it into a bunchbouquet of regionsles régions

79

219160

2456

et la découpaient en zones,

03:53

and then runcourir a classifierclassificateur
on eachchaque of these regionsles régions,

80

221640

3256

puis un classifieur
analysait chacune d'elles,

03:56

and highhaute scoresscores for that classifierclassificateur

81

224920

2536

et on considérait qu'il était performant

03:59

would be consideredpris en considération
detectionsdétections in the imageimage.

82

227480

3136

s'il détectait des objets.

04:02

But this involvedimpliqué runningfonctionnement a classifierclassificateur
thousandsmilliers of timesfois over an imageimage,

83

230640

4056

Mais ça signifiait analyser une image
des milliers de fois

04:06

thousandsmilliers of neuralneural networkréseau evaluationsévaluations
to produceproduire detectiondétection.

84

234720

2920

autant d'évaluations en réseau neuronal
pour une détection.

04:11

InsteadAu lieu de cela, we trainedqualifié a singleunique networkréseau
to do all of detectiondétection for us.

85

239240

4536

On a alors entraîné un réseau unique
à réaliser ces détections.

04:15

It producesproduit all of the boundingenglobante boxesdes boites
and classclasse probabilitiesprobabilités simultaneouslysimultanément.

86

243800

4280

Il produit les délimitations et
les probabilités de classes simultanément.

04:20

With our systemsystème, insteadau lieu of looking
at an imageimage thousandsmilliers of timesfois

87

248680

3496

Notre système n'observe
plus une image des milliers de fois

04:24

to produceproduire detectiondétection,

88

252200

1456

afin de réaliser une détection

04:25

you only look onceune fois que,

89

253680

1256

mais une fois seulement.

04:26

and that's why we call it
the YOLOYOLO methodméthode of objectobjet detectiondétection.

90

254960

2920

D'où la méthode de détection YOLO
pour « You Only Look Once ».

04:31

So with this speedla vitesse,
we're not just limitedlimité to imagesimages;

91

259360

3976

Cette rapidité ne nous limite plus
aux seules images

04:35

we can processprocessus videovidéo in realréal time.

92

263360

2416

et nous pouvons analyser
des vidéos en temps réel.

04:37

And now, insteadau lieu of just seeingvoyant
that catchat and dogchien,

93

265800

3096

Et au lieu de voir un chat et un chien,

04:40

we can see them movebouge toi around
and interactinteragir with eachchaque other.

94

268920

2960

on les voit se déplacer et interagir.

04:46

This is a detectordétecteur that we trainedqualifié

95

274560

2056

Nous avons entraîné ce détecteur

04:48

on 80 differentdifférent classesclasses

96

276640

4376

sur 80 classes différentes

04:53

in Microsoft'sDe Microsoft COCOCOCO datasetDataSet.

97

281040

3256

de la base COCO de Microsoft.

04:56

It has all sortssortes of things
like spooncuillère and forkfourche, bowlbol,

98

284320

3336

Elle contient un peu de tout,
de la cuillère à la fourchette et au bol.

04:59

commoncommun objectsobjets like that.

99

287680

1800

Ce genre d'objets usuels.

05:02

It has a varietyvariété of more exoticexotiques things:

100

290360

3096

Elle contient diverses choses
plus inhabituelles :

05:05

animalsanimaux, carsdes voitures, zebraszèbres, giraffesgirafes.

101

293480

3256

des animaux, des voitures,
des zèbres, des girafes.

05:08

And now we're going to do something funamusement.

102

296760

1936

Faisons une chose amusante.

05:10

We're just going to go
out into the audiencepublic

103

298720

2096

Parcourons le public

05:12

and see what kindgentil of things we can detectdétecter.

104

300840

2016

et voyons le genre d'objets détectés.

05:14

Does anyonen'importe qui want a stuffedfarci animalanimal?

105

302880

1620

Quelqu'un veut-il une peluche ?

05:18

There are some teddynounours bearsours out there.

106

306000

1762

Il y a des ours en peluche ici.

05:22

And we can turntour down
our thresholdseuil for detectiondétection a little bitbit,

107

310040

4536

Repoussons un peu notre seuil de détection

05:26

so we can find more of you guys
out in the audiencepublic.

108

314600

3400

afin d'en trouver plus dans le public.

05:31

Let's see if we can get these stop signssignes.

109

319560

2336

Voyons si on trouve ces panneaux.

05:33

We find some backpackssacs à dos.

110

321920

1880

On a des sacs à dos.

05:37

Let's just zoomZoom in a little bitbit.

111

325880

1840

Zoomons juste un peu.

05:42

And this is great.

112

330320

1256

C'est formidable.

05:43

And all of the processingEn traitement
is happeningévénement in realréal time

113

331600

3176

L'analyse entière se déroule en temps réel

05:46

on the laptopportable.

114

334800

1200

sur l'ordinateur.

05:49

And it's importantimportant to rememberrappelles toi

115

337080

1456

Il faut garder à l'esprit

05:50

that this is a generalgénéral purposeobjectif
objectobjet detectiondétection systemsystème,

116

338560

3216

qu'il s'agit d'un système de détection
d'objets à usage général.

05:53

so we can traintrain this for any imageimage domaindomaine.

117

341800

5000

On peut donc l'entraîner
pour tout domaine visuel.

06:00

The sameMême codecode that we use

118

348320

2536

Le même code employé

06:02

to find stop signssignes or pedestrianspiétons,

119

350880

2456

pour détecter
des panneaux stop, des piétons

06:05

bicyclesbicyclettes in a self-drivingSelf-driving vehiclevéhicule,

120

353360

1976

ou des vélos par un véhicule autoguidé

06:07

can be used to find cancercancer cellscellules

121

355360

2856

peut être employé
pour détecter des cellules cancéreuses

06:10

in a tissuetissu biopsybiopsie.

122

358240

3016

dans une biopsie.

06:13

And there are researchersdes chercheurs around the globeglobe
alreadydéjà usingen utilisant this technologyLa technologie

123

361280

4040

Dans le monde, des chercheurs utilisent
déjà cette technologie

06:18

for advancesavances in things
like medicinemédicament, roboticsrobotique.

124

366240

3416

pour progresser dans des domaines
comme la médecine ou la robotique.

06:21

This morningMatin, I readlis a paperpapier

125

369680

1376

Ce matin, j'ai lu un article

06:23

where they were takingprise a censusrecensement
of animalsanimaux in NairobiNairobi NationalNational ParkParc

126

371080

4576

qui traitait d'un recensement
de la faune du parc national de Nairobi

06:27

with YOLOYOLO as partpartie
of this detectiondétection systemsystème.

127

375680

3136

qui utilisait YOLO
pour son système de détection.

06:30

And that's because DarknetDarknet is openouvrir sourcela source

128

378840

3096

Cela est possible
car Darknet est en open source

06:33

and in the publicpublic domaindomaine,
freegratuit for anyonen'importe qui to use.

129

381960

2520

et dans le domaine public.
Il est libre d'utilisation.

06:37

(ApplauseApplaudissements)

130

385600

5696

(Applaudissements)

06:43

But we wanted to make detectiondétection
even more accessibleaccessible and usableutilisable,

131

391320

4936

On a voulu rendre la détection
encore plus accessible et pratique.

06:48

so throughpar a combinationcombinaison
of modelmaquette optimizationoptimisation,

132

396280

4056

Grâce à la combinaison
de l'optimisation de modèles

06:52

networkréseau binarizationbinarisation and approximationrapprochement,

133

400360

2296

la binarisation du réseau
et l'approximation,

06:54

we actuallyréellement have objectobjet detectiondétection
runningfonctionnement on a phonetéléphone.

134

402680

3920

on peut utiliser la détection d'objets
sur un téléphone.

07:04

(ApplauseApplaudissements)

135

412800

5320

(Applaudissements)

07:10

And I'm really excitedexcité because
now we have a prettyjoli powerfulpuissant solutionSolution

136

418960

5056

Je suis ravi, car nous disposons
dorénavant d'une solution puissante

07:16

to this low-levelbas niveau computerordinateur visionvision problemproblème,

137

424040

2296

pour ce problème
de vision artificielle faible.

07:18

and anyonen'importe qui can take it
and buildconstruire something with it.

138

426360

3856

Il peut servir à quiconque
pour concevoir ce qu'il veut.

07:22

So now the restdu repos is up to all of you

139

430240

3176

Pour le reste, à vous de jouer

07:25

and people around the worldmonde
with accessaccès to this softwareLogiciel,

140

433440

2936

ainsi qu'à ceux dans le monde
qui ont accès à ce logiciel.

07:28

and I can't wait to see what people
will buildconstruire with this technologyLa technologie.

141

436400

3656

J'ai hâte de voir ce que les gens
feront de cette technologie.

07:32

Thank you.

142

440080

1216

Merci.

07:33

(ApplauseApplaudissements)

143

441320

3440

(Applaudissements)

Translated by Marie-Caroline Braud
Reviewed by Shadia Ramsahye

ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

Joseph Redmon: Comment enseigner à un ordinateur la reconnaissance instantanée des objets | TED Talk | TED.com