ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

TED2017

Joseph Redmon: How computers learn to recognize objects instantly

Joseph Redmon: Come un computer impara a riconoscere gli oggetti istantaneamente

Filmed: 2017-04-24

Readability: 4.5

2,471,805 views

Dieci anni fa, i ricercatori pensavano che sarebbe stato quasi impossibile per un computer riconoscere la differenza tra un gatto e un cane. Oggi, i sistemi di visione computerizzati lo fanno con una precisione superiore al 99%. Come? Joseph Redmon lavora al sistema YOLO (You Only Look Once), un metodo di riconoscimento di oggetti open-source che può identificare immagini e video -- dalle zebre ai segnali di stop -- in un batter d'occhio. In una notevole live demo, Redmon sfoggia gli importanti passi avanti fatti, in ambiti come le auto senza pilota, la robotica e la diagnosi del cancro.

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time. Full bio

Double-click the English transcript below to play the video.

Dieci anni fa,

00:12

TenDieci yearsanni agofa,

0

825

1151

i ricercatori di visione artificiale
pensavano

00:14

computercomputer visionvisione researchersricercatori
thought that gettingottenere a computercomputer

1

2000

2776

che fare in modo che un computer

00:16

to tell the differencedifferenza
betweenfra a catgatto and a dogcane

2

4800

2696

riuscisse a differenziare
un gatto e un cane

00:19

would be almostquasi impossibleimpossibile,

3

7520

1976

sarebbe stato quasi impossibile,

00:21

even with the significantsignificativo advanceavanzare
in the statestato of artificialartificiale intelligenceintelligenza.

4

9520

3696

nonostante il progresso significativo
nel campo dell'intelligenza artificiale.

00:25

Now we can do it at a levellivello
greatermaggiore than 99 percentper cento accuracyprecisione.

5

13240

3560

Ora possiamo farlo a un livello
di precisione superiore al 99 per cento.

Questa è chiamata
classificazione d'immagini --

00:29

This is calledchiamato imageImmagine classificationclassificazione --

6

17680

1856

00:31

give it an imageImmagine,
put a labeletichetta to that imageImmagine --

7

19560

3096

dategli un'immagine,
etichettate quell'immagine --

00:34

and computerscomputer know
thousandsmigliaia of other categoriescategorie as well.

8

22680

3040

e i computer riconoscono
anche migliaia di altre categorie.

00:38

I'm a graduatediplomato studentalunno
at the UniversityUniversità of WashingtonWashington,

9

26680

2896

Sono un dottorando
della University of Washington,

00:41

and I work on a projectprogetto calledchiamato DarknetDarknet,

10

29600

1896

e lavoro su un progetto
chiamato Darknet,

00:43

whichquale is a neuralneurale networkRete frameworkstruttura

11

31520

1696

che è un framework di rete neurale

00:45

for trainingformazione and testinganalisi
computercomputer visionvisione modelsModelli.

12

33240

2816

per sviluppare e testare
i modelli di visione artificiale.

00:48

So let's just see what DarknetDarknet thinkspensa

13

36080

2976

Quindi vediamo cosa ne pensa Darknet

00:51

of this imageImmagine that we have.

14

39080

1760

di quest'immagine.

00:54

When we runcorrere our classifierclassificatore

15

42520

2336

Quando eseguiamo il nostro classificatore

su quest'immagine,

00:56

on this imageImmagine,

16

44880

1216

vediamo che non otteniamo solo
la previsione di un cane o di un gatto,

00:58

we see we don't just get
a predictionpredizione of dogcane or catgatto,

17

46120

2456

01:00

we actuallyin realtà get
specificspecifica breedrazza predictionsPrevisioni.

18

48600

2336

ma la previsione
della razza specifica.

01:02

That's the levellivello
of granularitygranularità we have now.

19

50960

2176

Questo è il livello di precisione attuale.

01:05

And it's correctcorretta.

20

53160

1616

Ed è corretto.

01:06

My dogcane is in factfatto a malamuteMalamute.

21

54800

1840

Infatti, il mio cane è
un Alaskan Malamute.

01:09

So we'venoi abbiamo madefatto amazingStupefacente stridespassi da gigante
in imageImmagine classificationclassificazione,

22

57040

4336

Abbiamo fatto dei passi da gigante
nella classificazione di immagini,

ma cosa succede quando passiamo
il nostro classificatore

01:13

but what happensaccade
when we runcorrere our classifierclassificatore

23

61400

2000

01:15

on an imageImmagine that lookssembra like this?

24

63424

1960

su un'immagine come questa?

01:19

Well ...

25

67080

1200

Beh...

01:24

We see that the classifierclassificatore comesviene back
with a prettybella similarsimile predictionpredizione.

26

72640

3896

Vediamo che il classificatore ritorna
con una predizione abbastanza simile.

01:28

And it's correctcorretta,
there is a malamuteMalamute in the imageImmagine,

27

76560

3096

Ed è corretto,
c'è un Alaskan Malamute sull'immagine,

01:31

but just givendato this labeletichetta,
we don't actuallyin realtà know that much

28

79680

3696

ma con questa sola etichetta,
non ne sappiamo poi molto

di quello che succede nell'immagine.

01:35

about what's going on in the imageImmagine.

29

83400

1667

Abbiamo bisogno di qualcosa
di più potente.

01:37

We need something more powerfulpotente.

30

85091

1560

01:39

I work on a problemproblema
calledchiamato objectoggetto detectionrivelazione,

31

87240

2616

Io lavoro su un problema
chiamato riconoscimento di oggetti,

01:41

where we look at an imageImmagine
and try to find all of the objectsoggetti,

32

89880

2936

dove guardiamo un'immagine
e cerchiamo di trovare gli oggetti,

01:44

put boundingdi delimitazione boxesscatole around them

33

92840

1456

li delimitiamo con dei "bounding boxes"

01:46

and say what those objectsoggetti are.

34

94320

1520

e definiamo quegli oggetti.

01:48

So here'secco what happensaccade
when we runcorrere a detectorrivelatore on this imageImmagine.

35

96400

3280

Questo è quello che succede

quando passiamo un rilevatore
su quest'immagine.

01:53

Now, with this kindgenere of resultrisultato,

36

101240

2256

Ora, con questo tipo di risultato,

possiamo fare molto di più

01:55

we can do a lot more
with our computercomputer visionvisione algorithmsalgoritmi.

37

103520

2696

con i nostri algoritmi
di visione artificiale.

01:58

We see that it knowsconosce
that there's a catgatto and a dogcane.

38

106240

2976

Vediamo che riconosce
che ci sono un gatto e un cane.

02:01

It knowsconosce theirloro relativeparente locationsposizioni,

39

109240

2256

Conosce la loro posizione,

02:03

theirloro sizedimensione.

40

111520

1216

la loro taglia.

Potrebbe addirittura conoscere
informazioni extra.

02:04

It maypuò even know some extraextra informationinformazione.

41

112760

1936

02:06

There's a booklibro sittingseduta in the backgroundsfondo.

42

114720

1960

C'è un libro sullo sfondo.

02:09

And if you want to buildcostruire a systemsistema
on topsuperiore of computercomputer visionvisione,

43

117280

3256

Se vuoi costruire un sistema
basato sulla visione artificiale,

02:12

say a self-drivingSelf-Guida vehicleveicolo
or a roboticrobotica systemsistema,

44

120560

3456

diciamo un'auto senza pilota
o un sistema robotico,

02:16

this is the kindgenere
of informationinformazione that you want.

45

124040

2456

questo è il tipo di informazione che vuoi.

02:18

You want something so that
you can interactinteragire with the physicalfisico worldmondo.

46

126520

3239

Vuoi qualcosa per poter interagire
con il mondo fisico.

Quando ho iniziato a lavorare
sul riconoscimento di oggetti,

02:22

Now, when I startediniziato workinglavoro
on objectoggetto detectionrivelazione,

47

130759

2257

02:25

it tookha preso 20 secondssecondi
to processprocesso a singlesingolo imageImmagine.

48

133040

3296

servivano 20 secondi
per processare una sola immagine.

02:28

And to get a feel for why
speedvelocità is so importantimportante in this domaindominio,

49

136360

3880

E per capire quanto è importante
la velocità in questo settore,

02:33

here'secco an exampleesempio of an objectoggetto detectorrivelatore

50

141120

2536

ecco un esempio
di un rilevatore di oggetti

02:35

that takes two secondssecondi
to processprocesso an imageImmagine.

51

143680

2416

che impiega due secondi
per processare un'immagine.

02:38

So this is 10 timesvolte fasterPiù veloce

52

146120

2616

Quindi questo è 10 volte più veloce

02:40

than the 20-seconds-per-image-secondi-per-immagine detectorrivelatore,

53

148760

3536

del rilevatore a 20 secondi a immagine,

02:44

and you can see that by the time
it makesfa predictionsPrevisioni,

54

152320

2656

e potete vedere che nel momento in cui
fa le sue predizioni

02:47

the entireintero statestato of the worldmondo has changedcambiato,

55

155000

2040

l'aspetto del mondo è cambiato,

02:49

and this wouldn'tno be very usefulutile

56

157880

2416

e questo non sarebbe utile

02:52

for an applicationapplicazione.

57

160320

1416

per un'applicazione.

02:53

If we speedvelocità this up
by anotherun altro factorfattore of 10,

58

161760

2496

Se velocizziamo di un altro fattore 10,

questo è un rilevatore che funziona
a cinque immagini al secondo.

02:56

this is a detectorrivelatore runningin esecuzione
at fivecinque framesmontatura perper secondsecondo.

59

164280

2816

02:59

This is a lot better,

60

167120

1536

È molto meglio,

03:00

but for exampleesempio,

61

168680

1976

ma, ad esempio,

03:02

if there's any significantsignificativo movementmovimento,

62

170680

2296

se c'è un qualsiasi
movimento significativo,

03:05

I wouldn'tno want a systemsistema
like this drivingguida my carauto.

63

173000

2560

non vorrei un sistema come questo
mentre guido.

Questo è il nostro sistema
di riconoscimento

03:09

This is our detectionrivelazione systemsistema
runningin esecuzione in realvero time on my laptopil computer portatile.

64

177120

3240

in funzione in tempo reale
sul mio computer.

03:13

So it smoothlyliscio tracksbrani me
as I movemossa around the frametelaio,

65

181000

3136

Quindi mi identifica senza problemi
mentre mi muovo sull'immagine,

03:16

and it's robustrobusto to a widelargo varietyvarietà
of changesi cambiamenti in sizedimensione,

66

184160

3720

ed è efficace anche quando
cambiano la taglia,

03:21

poseposa,

67

189440

1200

la posa,

03:23

forwardinoltrare, backwardcon le versioni precedenti.

68

191280

1856

avanti, indietro.

03:25

This is great.

69

193160

1216

È fantastico.

Questo è ciò di cui
abbiamo davvero bisogno

03:26

This is what we really need

70

194400

1736

03:28

if we're going to buildcostruire systemssistemi
on topsuperiore of computercomputer visionvisione.

71

196160

2896

se vogliamo costruire sistemi
basati sulla visione artificiale.

03:31

(ApplauseApplausi)

72

199080

4000

(Applausi)

03:36

So in just a fewpochi yearsanni,

73

204280

2176

Quindi in pochi anni,

03:38

we'venoi abbiamo goneandato from 20 secondssecondi perper imageImmagine

74

206480

2656

siamo passati da 20 secondi a immagine

03:41

to 20 millisecondsmillisecondi perper imageImmagine,
a thousandmille timesvolte fasterPiù veloce.

75

209160

3536

a 20 millisecondi a immagine,
mille volte più veloce.

03:44

How did we get there?

76

212720

1416

Come ci siamo riusciti?

03:46

Well, in the pastpassato,
objectoggetto detectionrivelazione systemssistemi

77

214160

3016

In passato, i sistemi
di riconoscimento di oggetti

03:49

would take an imageImmagine like this

78

217200

1936

avrebbero preso
un'immagine come questa

03:51

and splitDiviso it into a bunchmazzo of regionsregioni

79

219160

2456

e l'avrebbero divisa
in un insieme di regioni

03:53

and then runcorrere a classifierclassificatore
on eachogni of these regionsregioni,

80

221640

3256

e poi passato un classificatore
su ognuna di queste regioni,

03:56

and highalto scorespunteggi for that classifierclassificatore

81

224920

2536

e punteggi elevati per quel classificatore

03:59

would be consideredconsiderato
detectionsrilevamenti in the imageImmagine.

82

227480

3136

sarebbero stati considerati
come riconoscimenti nell'immagine.

04:02

But this involvedcoinvolti runningin esecuzione a classifierclassificatore
thousandsmigliaia of timesvolte over an imageImmagine,

83

230640

4056

Ma questo voleva dire

passare un classificatore
migliaia di volte su un'immagine,

04:06

thousandsmigliaia of neuralneurale networkRete evaluationsvalutazioni
to produceprodurre detectionrivelazione.

84

234720

2920

migliaia di valutazioni di rete neurale
per produrre il riconoscimento.

04:11

InsteadInvece, we trainedallenato a singlesingolo networkRete
to do all of detectionrivelazione for us.

85

239240

4536

Invece, abbiamo allenato una singola rete
a fare tutto il riconoscimento per noi.

04:15

It producesproduce all of the boundingdi delimitazione boxesscatole
and classclasse probabilitiesprobabilità simultaneouslycontemporaneamente.

86

243800

4280

Produce tutti i bounding boxes
e ordina le probabilità simultaneamente.

04:20

With our systemsistema, insteadanziché of looking
at an imageImmagine thousandsmigliaia of timesvolte

87

248680

3496

Con il nostro sistema, invece di guardare
un'immagine migliaia di volte

04:24

to produceprodurre detectionrivelazione,

88

252200

1456

per ottenere il riconoscimento,

04:25

you only look onceuna volta,

89

253680

1256

guardi una volta sola,

04:26

and that's why we call it
the YOLOYOLO methodmetodo of objectoggetto detectionrivelazione.

90

254960

2920

ed è per questo che lo chiamiamo

il metodo YOLO
del riconoscimento d'oggetti.

04:31

So with this speedvelocità,
we're not just limitedlimitato to imagesimmagini;

91

259360

3976

Con questa velocità possiamo quindi
non limitarci alle immagini;

04:35

we can processprocesso videovideo in realvero time.

92

263360

2416

ma possiamo processare video
in tempo reale.

04:37

And now, insteadanziché of just seeingvedendo
that catgatto and dogcane,

93

265800

3096

E ora, invece di vedere solo
il cane e il gatto,

04:40

we can see them movemossa around
and interactinteragire with eachogni other.

94

268920

2960

possiamo vederli muovere
e interagire tra loro.

04:46

This is a detectorrivelatore that we trainedallenato

95

274560

2056

Questo è un rilevatore
che abbiamo allenato

04:48

on 80 differentdiverso classesclassi

96

276640

4376

su 80 classi diverse

04:53

in Microsoft'sDi Microsoft COCOCOCO datasetDataSet.

97

281040

3256

nel dataset COCO di Microsoft.

04:56

It has all sortstipi of things
like spooncucchiaio and forkforcella, bowlciotola,

98

284320

3336

Contiene di tutto come cucchiaio
e forchetta, ciotola,

04:59

commonComune objectsoggetti like that.

99

287680

1800

oggetti comuni come questi.

05:02

It has a varietyvarietà of more exoticesotici things:

100

290360

3096

Ma anche una varietà di cose più esotiche:

05:05

animalsanimali, carsautomobili, zebraszebre, giraffesGiraffe.

101

293480

3256

animali, auto, zebre, giraffe.

05:08

And now we're going to do something fundivertimento.

102

296760

1936

E adesso facciamo qualcosa di divertente.

05:10

We're just going to go
out into the audiencepubblico

103

298720

2096

Ci metteremo in mezzo al pubblico

per vedere che tipo di oggetti
possiamo identificare.

05:12

and see what kindgenere of things we can detectindividuare.

104

300840

2016

05:14

Does anyonechiunque want a stuffedripieni animalanimale?

105

302880

1620

Qualcuno vuole un peluche?

05:18

There are some teddyorsacchiotto bearsorsi out there.

106

306000

1762

Ci sono degli orsacchiotti lì in mezzo.

05:22

And we can turnturno down
our thresholdsoglia for detectionrivelazione a little bitpo,

107

310040

4536

E possiamo abbassare un po'
la nostra soglia di riconoscimento,

05:26

so we can find more of you guys
out in the audiencepubblico.

108

314600

3400

così possiamo riconoscervi meglio
in mezzo al pubblico.

05:31

Let's see if we can get these stop signssegni.

109

319560

2336

Vediamo se riusciamo a trovare
dei segnali di stop.

05:33

We find some backpacksZaini.

110

321920

1880

Troviamo degli zaini.

05:37

Let's just zoomzoom in a little bitpo.

111

325880

1840

Facciamo uno zoom.

05:42

And this is great.

112

330320

1256

Ed è fantastico.

05:43

And all of the processinglavorazione
is happeningavvenimento in realvero time

113

331600

3176

E tutto il processo
avviene in tempo reale

05:46

on the laptopil computer portatile.

114

334800

1200

sul computer.

Ed è importante ricordare

05:49

And it's importantimportante to rememberricorda

115

337080

1456

che questo è un sistema
di riconoscimento di oggetti

05:50

that this is a generalgenerale purposescopo
objectoggetto detectionrivelazione systemsistema,

116

338560

3216

di uso generale,

05:53

so we can traintreno this for any imageImmagine domaindominio.

117

341800

5000

quindi lo possiamo allenare
per qualsiasi settore di immagini.

06:00

The samestesso codecodice that we use

118

348320

2536

Lo stesso codice che usiamo

06:02

to find stop signssegni or pedestrianspedoni,

119

350880

2456

per trovare segnali di stop o pedoni,

06:05

bicyclesbiciclette in a self-drivingSelf-Guida vehicleveicolo,

120

353360

1976

biciclette in un veicolo
con pilota automatico,

06:07

can be used to find cancercancro cellscellule

121

355360

2856

può essere usato per trovare
cellule cancerose

06:10

in a tissuefazzoletto di carta biopsybiopsia.

122

358240

3016

durante una biopsia.

06:13

And there are researchersricercatori around the globeglobo
alreadygià usingutilizzando this technologytecnologia

123

361280

4040

E ci sono ricercatori in tutto il mondo
che stanno già usando questa tecnologia

06:18

for advancesavanzamenti in things
like medicinemedicina, roboticsRobotica.

124

366240

3416

per fare passi avanti in campi
come la medicina, la robotica.

06:21

This morningmattina, I readleggere a papercarta

125

369680

1376

Questa mattina, ho letto un articolo

06:23

where they were takingpresa a censuscensimento
of animalsanimali in NairobiNairobi NationalNazionale ParkParco

126

371080

4576

in cui si parlava di un censimento
degli animali al Nairobi National Park

06:27

with YOLOYOLO as partparte
of this detectionrivelazione systemsistema.

127

375680

3136

con YOLO integrato
nel sistema di riconoscimento.

06:30

And that's because DarknetDarknet is openAperto sourcefonte

128

378840

3096

Ed è perché Darknet è open source

06:33

and in the publicpubblico domaindominio,
freegratuito for anyonechiunque to use.

129

381960

2520

ed è di dominio pubblico,
e chiunque può utilizzarlo liberamente.

06:37

(ApplauseApplausi)

130

385600

5696

(Applausi)

06:43

But we wanted to make detectionrivelazione
even more accessibleaccessibile and usableutilizzabile,

131

391320

4936

Ma volevamo rendere il riconoscimento
ancora più accessibile e fruibile,

06:48

so throughattraverso a combinationcombinazione
of modelmodello optimizationottimizzazione,

132

396280

4056

e attraverso una combinazione
di ottimizzazione del modello,

06:52

networkRete binarizationbinarizzazione and approximationapprossimazione,

133

400360

2296

binarizzazione di rete e approssimazione,

06:54

we actuallyin realtà have objectoggetto detectionrivelazione
runningin esecuzione on a phoneTelefono.

134

402680

3920

abbiamo un riconoscimento di oggetti
che funziona su un telefono.

07:04

(ApplauseApplausi)

135

412800

5320

(Applausi)

07:10

And I'm really excitedemozionato because
now we have a prettybella powerfulpotente solutionsoluzione

136

418960

5056

E sono davvero contento perché
abbiamo una soluzione piuttosto efficace

07:16

to this low-levelbasso livello computercomputer visionvisione problemproblema,

137

424040

2296

a questo problema di visione
di computer di basso livello,

07:18

and anyonechiunque can take it
and buildcostruire something with it.

138

426360

3856

e chiunque può prenderlo
e costruirci qualcosa.

07:22

So now the restriposo is up to all of you

139

430240

3176

Quindi il resto è nelle vostre mani

e in quelle delle persone nel mondo
che hanno accesso a questo software,

07:25

and people around the worldmondo
with accessaccesso to this softwareSoftware,

140

433440

2936

e sono impaziente di vedere

07:28

and I can't wait to see what people
will buildcostruire with this technologytecnologia.

141

436400

3656

cosa le persone faranno
con questa tecnologia.

07:32

Thank you.

142

440080

1216

Grazie.

07:33

(ApplauseApplausi)

143

441320

3440

(Applausi)

Translated by Elisabetta Siagri
Reviewed by Maria Carmina Distratto

ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

Joseph Redmon: Come un computer impara a riconoscere gli oggetti istantaneamente | TED Talk | TED.com