ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

TED2017

Joseph Redmon: How computers learn to recognize objects instantly

ジョセフ・レドモン: コンピューターはいかに物体を即座に認識できるようになったのか

Filmed: 2017-04-24

Readability: 4.5

2,471,805 views

10年前には研究者達はコンピューターで猫と犬を見分けるのはほとんど無理だと思っていました。今日では、コンピュータービジョンシステムにより99%以上の精度で行えるようになっています。どうやってでしょうか？ジョセフ・レドモンはオープンソースの物体検出システム YOLO (You Only Look Once) に取り組んでいて、シマウマから一時停止の標識まで、映像や画像の中の物体を瞬時に識別できるようにしています。この目を見張るようなデモで、レドモンは自動運転車やロボットやガンの検出といった応用に向けた重要なステップを披露しています。

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time. Full bio

Double-click the English transcript below to play the video.

00:12

Ten十 years年 ago前,

0

825

1151

10年前

コンピュータービジョンの研究者は

00:14

computerコンピューター visionビジョン researchers研究者
thought that getting取得 a computerコンピューター

1

2000

2776

コンピューターで
犬と猫を見分けるのは

00:16

to tell the difference差
betweenの間に a catネコ and a dog犬

2

4800

2696

ほとんど無理だと
考えていました

00:19

would be almostほぼ impossible不可能,

3

7520

1976

00:21

even with the significant重要な advance前進
in the state状態 of artificial人工的な intelligenceインテリジェンス.

4

9520

3696

人工知能の大きな
発展にもかかわらずです

00:25

Now we can do it at a levelレベル
greater大きい than 99 percentパーセント accuracy正確さ.

5

13240

3560

現在では99%以上の精度で
見分けられるようになっています

00:29

This is calledと呼ばれる image画像 classification分類 --

6

17680

1856

これは「画像分類」と
呼ばれる問題で

00:31

give it an image画像,
put a labelラベル to that image画像 --

7

19560

3096

コンピューターに画像の
ラベル付けをさせるものです

00:34

and computersコンピュータ know
thousands千 of other categoriesカテゴリ as well.

8

22680

3040

コンピューターは何千種もの物を
識別できるようになっています

00:38

I'm a graduate卒業 student学生
at the University大学 of Washingtonワシントン,

9

26680

2896

私はワシントン大学の大学院生で

Darknetというプロジェクトに
取り組んでいます

00:41

and I work on a projectプロジェクト calledと呼ばれる Darknetダークネット,

10

29600

1896

00:43

whichどの is a neuralニューラル networkネットワーク frameworkフレームワーク

11

31520

1696

コンピュータービジョンのモデルを
トレーニングしテストするための

00:45

for trainingトレーニング and testingテスト
computerコンピューター visionビジョン modelsモデル.

12

33240

2816

ニューラルネット・フレームワークです

00:48

So let's just see what Darknetダークネット thinks考える

13

36080

2976

Darknetが
あの犬の画像を

何だと思うか
見てみましょう

00:51

of this image画像 that we have.

14

39080

1760

あの画像を

00:54

When we run走る our classifier分級機

15

42520

2336

私たちの画像分類
プログラムにかけると

00:56

on this image画像,

16

44880

1216

00:58

we see we don't just get
a prediction予測 of dog犬 or catネコ,

17

46120

2456

犬か猫かだけでなく

01:00

we actually実際に get
specific特定 breed繁殖 predictions予測.

18

48600

2336

具体的な犬種まで
言い当てます

01:02

That's the levelレベル
of granularity細かい we have now.

19

50960

2176

そこまで細かいことが
分かるようになっています

そして正しい答えを出しています
[マラミュート犬 37% ハスキー犬 15% エスキモー犬 12%]

01:05

And it's correct正しい.

20

53160

1616

01:06

My dog犬 is in fact事実 a malamuteMalamute.

21

54800

1840

私の犬は確かにマラミュート犬です
[マラミュート犬 37% ハスキー犬 15% エスキモー犬 12%]

01:09

So we've私たちは made製 amazing素晴らしい strides歩み
in image画像 classification分類,

22

57040

4336

画像分類は驚くほど
進歩しましたが

こういう複数の物が写った写真を
画像分類にかけたら

01:13

but what happens起こる
when we run走る our classifier分級機

23

61400

2000

01:15

on an image画像 that looks外見 like this?

24

63424

1960

どうなるのでしょう？

01:19

Well ...

25

67080

1200

結果は—

01:24

We see that the classifier分級機 comes来る back
with a prettyかなり similar類似 prediction予測.

26

72640

3896

前とほぼ同じになっています
[マラミュート犬 7% エスキモー犬 6% ハスキー犬 6%]

それは正しくて画像の中には
確かにマラミュート犬がいますが

01:28

And it's correct正しい,
there is a malamuteMalamute in the image画像,

27

76560

3096

01:31

but just given与えられた this labelラベル,
we don't actually実際に know that much

28

79680

3696

そのラベルだけでは

この画像の中でどんなことが
起きているのかあまりわかりません

01:35

about what's going on in the image画像.

29

83400

1667

01:37

We need something more powerful強力な.

30

85091

1560

もっと強力なものが
ほしいところです

私は「物体検出」と呼ばれる
問題に取り組んでいて

01:39

I work on a problem問題
calledと呼ばれる objectオブジェクト detection検出,

31

87240

2616

01:41

where we look at an image画像
and try to find all of the objectsオブジェクト,

32

89880

2936

それは画像を見て
その中にある物体をすべて検出し

それぞれの物を箱で囲って

01:44

put boundingバウンディング boxesボックス around them

33

92840

1456

01:46

and say what those objectsオブジェクト are.

34

94320

1520

それが何か識別する
という問題です

この画像を物体検出プログラムにかけると
どうなるか見てみましょう

01:48

So here'sここにいる what happens起こる
when we run走る a detector検出器 on this image画像.

35

96400

3280

01:53

Now, with this kind種類 of result結果,

36

101240

2256

得られる結果は
こういうもので

01:55

we can do a lot more
with our computerコンピューター visionビジョン algorithmsアルゴリズム.

37

103520

2696

色んなことができます

01:58

We see that it knows知っている
that there's a catネコ and a dog犬.

38

106240

2976

猫と犬がいることがわかり

02:01

It knows知っている their彼らの relative相対 locations場所,

39

109240

2256

相対的な位置や

大きさもわかります

02:03

their彼らの sizeサイズ.

40

111520

1216

02:04

It mayかもしれない even know some extra余分な information情報.

41

112760

1936

おまけの情報もあります

向こうに本があるとか

02:06

There's a book本 sitting座っている in the backgroundバックグラウンド.

42

114720

1960

02:09

And if you want to buildビルドする a systemシステム
on top上 of computerコンピューター visionビジョン,

43

117280

3256

コンピュータービジョンを
使ったシステム

02:12

say a self-driving自己運転 vehicle車両
or a roboticロボット systemシステム,

44

120560

3456

自動運転車やロボットを
作ろうとするなら

02:16

this is the kind種類
of information情報 that you want.

45

124040

2456

これはまさに
欲しい情報でしょう

02:18

You want something so that
you can interact相互作用する with the physical物理的 world世界.

46

126520

3239

周りの世界と作用し合えるように
してくれるものが欲しいのです

私が物体検出に
取り組み始めた頃は

02:22

Now, when I started開始した workingワーキング
on objectオブジェクト detection検出,

47

130759

2257

02:25

it took取った 20 seconds秒
to processプロセス a singleシングル image画像.

48

133040

3296

１つの画像の処理に
20秒かかっていました

02:28

And to get a feel for why
speed速度 is so important重要 in this domainドメイン,

49

136360

3880

この領域でなぜスピードが重要なのか
分かってもらうため

02:33

here'sここにいる an example例 of an objectオブジェクト detector検出器

50

141120

2536

物体検出で画像の処理に
２秒かかるとどんな具合か

02:35

that takes two seconds秒
to processプロセス an image画像.

51

143680

2416

見ていただきましょう

02:38

So this is 10 times回 fasterもっと早く

52

146120

2616

これは画像１つにつき20秒かかる
画像検出プログラムより

02:40

than the 20-seconds-per-image秒あたりの画像 detector検出器,

53

148760

3536

10倍速いわけですが

02:44

and you can see that by the time
it makes作る predictions予測,

54

152320

2656

プログラムが答えを
出したときには

02:47

the entire全体 state状態 of the world世界 has changedかわった,

55

155000

2040

状況は既に変わっているため

02:49

and this wouldn'tしないだろう be very useful有用

56

157880

2416

あまりアプリケーションの役には

02:52

for an application応用.

57

160320

1416

立ちません

02:53

If we speed速度 this up
by another別の factor因子 of 10,

58

161760

2496

さらに10倍
高速化してみましょう

02:56

this is a detector検出器 runningランニング
at five五 framesフレーム per〜ごと second二番.

59

164280

2816

毎秒 5フレーム
処理しています

だいぶマシにはなりましたが

02:59

This is a lot better,

60

167120

1536

03:00

but for example例,

61

168680

1976

何か大きな動きがあると

03:02

if there's any significant重要な movement移動,

62

170680

2296

ズレが出ます

03:05

I wouldn'tしないだろう want a systemシステム
like this driving運転 my car車.

63

173000

2560

このようなシステムに
自分の車を運転して欲しくはありません

03:09

This is our detection検出 systemシステム
runningランニング in realリアル time on my laptopラップトップ.

64

177120

3240

これは私たちの物体検出システムで
ノートPC上でリアルタイムで動いています

03:13

So it smoothlyスムーズに tracksトラック me
as I move動く around the frameフレーム,

65

181000

3136

私が動き回っても
スムーズに追尾します

03:16

and it's robustロバストな to a wideワイド variety品種
of changes変更 in sizeサイズ,

66

184160

3720

様々な種類の変化にも対応できます
大きさとか

03:21

poseポーズ,

67

189440

1200

ポーズとか

03:23

forward前進, backward後方に.

68

191280

1856

前向き後ろ向き

03:25

This is great.

69

193160

1216

とてもいいです

これこそコンピュータービジョンを
使ったシステムを作ろうというときに

03:26

This is what we really need

70

194400

1736

03:28

if we're going to buildビルドする systemsシステム
on top上 of computerコンピューター visionビジョン.

71

196160

2896

欲しいものです

03:31

(Applause拍手)

72

199080

4000

(拍手)

03:36

So in just a few少数 years年,

73

204280

2176

ほんの数年で

１画像あたり20秒から
20ミリ秒へと

03:38

we've私たちは gone行った from 20 seconds秒 per〜ごと image画像

74

206480

2656

03:41

to 20 millisecondsミリ秒 per〜ごと image画像,
a thousand千 times回 fasterもっと早く.

75

209160

3536

1000倍高速化しました

03:44

How did we get there?

76

212720

1416

どうやって実現したのか？

03:46

Well, in the past過去,
objectオブジェクト detection検出 systemsシステム

77

214160

3016

以前の物体検出システムは

03:49

would take an image画像 like this

78

217200

1936

このような画像を受け取ると

03:51

and splitスプリット it into a bunch束 of regions地域

79

219160

2456

沢山の領域に分割し

03:53

and then run走る a classifier分級機
on each各 of these regions地域,

80

221640

3256

それぞれの領域を
分類プログラムにかけ

03:56

and high高い scores得点 for that classifier分級機

81

224920

2536

高いスコアが出たところに

03:59

would be considered考慮される
detections検出 in the image画像.

82

227480

3136

物体が検出されたと
見なしていました

04:02

But this involved関係する runningランニング a classifier分級機
thousands千 of times回 over an image画像,

83

230640

4056

この方法だと１つの画像に対し
分類プログラムを何千回も走らせ

04:06

thousands千 of neuralニューラル networkネットワーク evaluations評価
to produce作物 detection検出.

84

234720

2920

ニューラルネットによる評価が
何千回も必要になります

04:11

Instead代わりに, we trained訓練された a singleシングル networkネットワーク
to do all of detection検出 for us.

85

239240

4536

そうする代わりに１つのニューラルネットで
すべての検出を行うようトレーニングしました

04:15

It produces生産する all of the boundingバウンディング boxesボックス
and classクラス probabilities確率 simultaneously同時に.

86

243800

4280

境界の箱や分類の確からしさの確率を
すべて同時に生成するのです

我々のシステムでは
物体検出を行うために

04:20

With our systemシステム, instead代わりに of looking
at an image画像 thousands千 of times回

87

248680

3496

画像を何千回も見る代わりに

04:24

to produce作物 detection検出,

88

252200

1456

たった一度しか見ないのです

04:25

you only look once一度,

89

253680

1256

04:26

and that's why we call it
the YOLOヨロ method方法 of objectオブジェクト detection検出.

90

254960

2920

それがYOLO (You Only Look Once)の
名の所以です

04:31

So with this speed速度,
we're not just limited限られた to imagesイメージ;

91

259360

3976

これだけ速いと画像だけでなく

04:35

we can processプロセス videoビデオ in realリアル time.

92

263360

2416

映像もリアルタイムで処理できます

04:37

And now, instead代わりに of just seeing見る
that catネコ and dog犬,

93

265800

3096

猫と犬を検出するだけでなく

04:40

we can see them move動く around
and interact相互作用する with each各 other.

94

268920

2960

それぞれが動き回り
相手に反応しているのが分かります

04:46

This is a detector検出器 that we trained訓練された

95

274560

2056

この検出プログラムは

MicrosoftのCOCOデータセットにある
80種の物に対して

04:48

on 80 different異なる classesクラス

96

276640

4376

04:53

in Microsoft'sMicrosoftの COCOココ datasetデータセット.

97

281040

3256

トレーニングしてあります

04:56

It has all sortsソート of things
like spoonスプーン and forkフォーク, bowlボウル,

98

284320

3336

スプーンやフォークといった

日常的な物もあれば

04:59

common一般 objectsオブジェクト like that.

99

287680

1800

05:02

It has a variety品種 of more exoticエキゾチック things:

100

290360

3096

もっと変わった物もあります

05:05

animals動物, cars車, zebrasシマウマ, giraffesキリン.

101

293480

3256

動物車シマウマキリン

ちょっと面白いことをやりましょう

05:08

And now we're going to do something fun楽しい.

102

296760

1936

05:10

We're just going to go
out into the audience聴衆

103

298720

2096

客席からどんなものが
検出できるか

05:12

and see what kind種類 of things we can detect検出する.

104

300840

2016

試してみます

ぬいぐるみの動物が欲しい人？

05:14

Does anyone誰でも want a stuffedつめた animal動物?

105

302880

1620

05:18

There are some teddyテディ bearsクマ out there.

106

306000

1762

そこかしこに
テディベアがあります

05:22

And we can turn順番 down
our threshold閾値 for detection検出 a little bitビット,

107

310040

4536

検出器の閾値を少し下げて

05:26

so we can find more of you guys
out in the audience聴衆.

108

314600

3400

客席の皆さんを
検出できるようにしましょう

05:31

Let's see if we can get these stop signs兆候.

109

319560

2336

「一時停止」の標識を
検出できるでしょうか

05:33

We find some backpacksバックパック.

110

321920

1880

バックパックが
いくつかありますね

05:37

Let's just zoomズーム in a little bitビット.

111

325880

1840

もう少しズームしましょう

05:42

And this is great.

112

330320

1256

素晴らしいです

すべての処理が
ノートPC上で

05:43

And all of the processing処理
is happeningハプニング in realリアル time

113

331600

3176

リアルタイムで
実行されています

05:46

on the laptopラップトップ.

114

334800

1200

05:49

And it's important重要 to remember思い出す

115

337080

1456

重要なのはこれが

汎用物体検出システム
だということで

05:50

that this is a general一般 purpose目的
objectオブジェクト detection検出 systemシステム,

116

338560

3216

05:53

so we can train列車 this for any image画像 domainドメイン.

117

341800

5000

どのような領域の画像に対しても
トレーニングできます

06:00

The same同じ codeコード that we use

118

348320

2536

自動運転車が

一時停止の標識や歩行者や
自転車を検知するのに使うのと

06:02

to find stop signs兆候 or pedestrians歩行者,

119

350880

2456

06:05

bicycles自転車 in a self-driving自己運転 vehicle車両,

120

353360

1976

同じプログラムを

06:07

can be used to find cancer癌 cells細胞

121

355360

2856

組織生検でガンを
見つけるためにも

06:10

in a tissue組織 biopsy生検.

122

358240

3016

使えるのです

すでに世界中の研究者達が
この技術を使って

06:13

And there are researchers研究者 around the globeグローブ
already既に usingを使用して this technology技術

123

361280

4040

06:18

for advances進歩 in things
like medicine医学, roboticsロボット工学.

124

366240

3416

医学やロボット工学を
前進させています

06:21

This morning朝, I read読む a paper紙

125

369680

1376

今朝新聞で読んだんですが

06:23

where they were taking取る a census国勢調査
of animals動物 in Nairobiナイロビ Nationalナショナル Parkパーク

126

371080

4576

ナイロビ国立公園では
YOLOを検出システムとして使って

06:27

with YOLOヨロ as part部
of this detection検出 systemシステム.

127

375680

3136

動物の個体数調査を
しているそうです

06:30

And that's because Darknetダークネット is open開いた sourceソース

128

378840

3096

それというのもDarknetはオープンソースで
パブリックドメインなため

06:33

and in the publicパブリック domainドメイン,
free無料 for anyone誰でも to use.

129

381960

2520

誰でも無料で使えるからです

06:37

(Applause拍手)

130

385600

5696

(拍手)

06:43

But we wanted to make detection検出
even more accessibleアクセス可能な and usable使用可能な,

131

391320

4936

私たちは物体検出技術をさらに近づきやすく
使いやすいものにしたいと思い

06:48

so throughを通して a combination組み合わせ
of modelモデル optimization最適化,

132

396280

4056

モデルの最適化や
ネットワーク・バイナリぜーション

06:52

networkネットワーク binarization2値化 and approximation近似,

133

400360

2296

近似を組み合わせることで

06:54

we actually実際に have objectオブジェクト detection検出
runningランニング on a phone電話.

134

402680

3920

スマートフォン上で
動かせるようにしました

07:04

(Applause拍手)

135

412800

5320

(拍手)

07:10

And I'm really excited興奮した because
now we have a prettyかなり powerful強力な solution溶液

136

418960

5056

私はすごくワクワクしています

いまやこの基本的なコンピュータービジョンの
問題に対してとても強力な解があり

07:16

to this low-level低レベル computerコンピューター visionビジョン problem問題,

137

424040

2296

07:18

and anyone誰でも can take it
and buildビルドする something with it.

138

426360

3856

誰でもそれを使って
何か作り出すことができるんです

07:22

So now the rest残り is up to all of you

139

430240

3176

あとは皆さんや

このソフトウェアを使える
世界中の人々にかかっています

07:25

and people around the world世界
with accessアクセス to this softwareソフトウェア,

140

433440

2936

07:28

and I can't wait to see what people
will buildビルドする with this technology技術.

141

436400

3656

この技術を使ってみんなが
どんなものを作ってくれるか楽しみです

07:32

Thank you.

142

440080

1216

ありがとうございました

07:33

(Applause拍手)

143

441320

3440

(拍手)

Translated by Yasushi Aoki
Reviewed by Claire Ghyselen

ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

ジョセフ・レドモン: コンピューターはいかに物体を即座に認識できるようになったのか | TED Talk | TED.com