chapter 4 Flashcards

1
Q

What’s so hard about object recognition? (dogs)

A

 If the input is simply the pixels of the image, then the program first has to figure out which are “dog” pixels and which are “non-dog” pixels

 different dogs look very different

 they can be facing in various directions

 the lighting can vary considerably between images;

 parts of the dog can be blocked by other objects

 dog pixels” might look a lot like “cat pixels” or other animals.

 Under some lighting conditions, a cloud in the sky might even look very much like a dog.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The deep-learning revolution

A

The ability of machines to recognize objects in images and videos underwent a quantum leap in the 2010s due to advances in the area called deep learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

deep learning

A

methods for training deep neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

most successful deep networks are those …

A

whose structure mimics parts of the brain’s visual system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Fukushima - cognitron and neocognitron

A

inspired by David Hubel and Torsten Wiesel’s discoveries of hierarchical organization in visual systems

Fukushima reported some success training the neocognitron to recognize handwritten digits, but the specific learning methods he used did not seem to extend to more complex visual tasks.

the neocognitron was an important inspiration for later approaches to deep neural networks, including today’s most influential and widely used approach: convolutional neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

object recognition in the brain

A
  1. when eyes focus on a scene ⇒ receive light of different wavelengths that has been reflected by the objects and surfaces in the scene
  2. activates cells in each retina ⇒ grid of neurons in the back of the eye communicate their activation through the optic nerves and into brain, eventually activating neurons in the visual cortex, which resides in the back of the head
  3. visual cortex is roughly organized as a hierarchical series of layers of neurons ⇒ neurons in each layer communicate their activations to neurons in the succeeding layer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

object recognition

A

recognizing a particular group of pixels in an image as a particular object category (e.g. chair, dog, balloon etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ConvNets

A

inspired by neocognitron

the driving force behind todays deep learning revolution

first proposed by Yann LeCunn in the 1980s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Object Recognition in CovNets

A

recognizing a particular group of pixels in an image as a particular object category (e.g. chair, dog, balloon etc.)

neurons in different layers of this hierarchy act as “detectors” that respond to increasingly complex features appearing in the visual scene

there is a bottom-up (or feed-forward) flow of information, representing connections from lower to higher layers

units in each layer provide input to units in the next layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ConvNet input

A

image ⇒ array of numbers corresponding to the brightness and color of the image’s pixels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ConvNet output

A

network’s confidence (0 percent to 100 percent) for each category (“dog” and “cat”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ConvNet goal

A

have the network learn to output a high confidence for the correct category and a low confidence for the other category

the network will learn what set of features of the input image are most useful for this task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

activation maps

A

each layer of the network is represented by a set of three overlapping rectangles. These rectangles represent activation maps

units in a ConvNet act as detectors for important visual features

each unit looks for its designated feature in a specific part of the visual field

each layer in ConvNet consists of several grids of these units

each grid forms an activation map for a specific visual feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

edge detectors

A

each neuron only looks at part of the visual scene (its receptive field)

the neuron becomes active only if its receptive field contains a particular kind of edge (e.g. horizontal or vertical)

feed into higher level processing regions where neuron might detect certain shapes, objects or faces

Each unit in each map calculates an activation value that measures the degree to which the region “matches” the unit’s preferred edge orientation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

receptive field of a unit

A

Each unit in a map corresponds to the analogous location in the input image, and each unit gets its input from a small region around that location —its receptive field.

(The receptive fields of neighboring units typically overlap.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

convolution

A

multiplying each value in a receptive field by its corresponding weight and summing the results

Image patches inside receptive fields are arrays of pixel values.
Each unit receives as input the pixel values in its receptive field. The unit then multiplies each input by its weight and sums the results to produce the unit’s activation

A key to the ConvNet’s success is that—again, inspired by the brain—these maps are hierarchical

16
Q

classification module

A

layers 1 to 4 of network are called convolutional layers because each performs convolutions on the preceding layer

At this point, it’s time for the classification module to use these features to predict what object the image depicts.

The classification module is actually an entire traditional neural network

inputs to classification module are activation maps from the highest convolutional layer

output is a set of percentage values, one for each possible category, rating the network’s confidence that the input depicts an image of that category

17
Q

explain what a convnet is

A

Inspired by hubel and wiesel’s findings on the brain’s visual cortex, a convnet takes an input image and transforms it – via convolutions – into a set of activation maps with increasingly compex features.

The features at the highest convolutional layers are fed into a traditional neural network (classification module), which outputs confidence percentages for the network’s known object categories.

The object category with the highest confidence is returned as the network’s classification of the image.

18
Q

Training a convnet

A

in real-world ConvNets edge detectors aren’t built in.

Instead, ConvNets learn from training examples what features should be detected at each layer and how to set the weights in the classification module

just as in traditional neural networks, all the weights can be learned from data via back- propagation

19
Q

training a convnet

A

 collect a training set

 label each image in the training set

 Your training program initially sets all the weights in the network to random values.

 one by one, each image is given as the input to the network; the network performs its layer- by-layer calculations and finally outputs confidence percentages for one of the categories

 For each image, your training program compares these output values to the “correct” values

 Then the training program uses the back-propagation algorithm to change the weights throughout the network just a bit, so that the next time this image is seen, the confidences will be closer to the correct values.

20
Q

epoch

A

input an image, then calculate the error at the output, then change the weights

Training a ConvNet requires many epochs, during which the network processes each image over and over again