chapter 4 Flashcards

Question 1

Q

What’s so hard about object recognition? (dogs)

Answer

A

 If the input is simply the pixels of the image, then the program first has to figure out which are “dog” pixels and which are “non-dog” pixels

 different dogs look very different

 they can be facing in various directions

 the lighting can vary considerably between images;

 parts of the dog can be blocked by other objects

 dog pixels” might look a lot like “cat pixels” or other animals.

 Under some lighting conditions, a cloud in the sky might even look very much like a dog.

Question 2

Q

The deep-learning revolution

Answer

A

The ability of machines to recognize objects in images and videos underwent a quantum leap in the 2010s due to advances in the area called deep learning.

Question 3

Q

deep learning

Answer

A

methods for training deep neural networks

Question 4

Q

most successful deep networks are those …

Answer

A

whose structure mimics parts of the brain’s visual system.

Question 5

Q

Fukushima - cognitron and neocognitron

Answer

A

inspired by David Hubel and Torsten Wiesel’s discoveries of hierarchical organization in visual systems

Fukushima reported some success training the neocognitron to recognize handwritten digits, but the specific learning methods he used did not seem to extend to more complex visual tasks.

the neocognitron was an important inspiration for later approaches to deep neural networks, including today’s most influential and widely used approach: convolutional neural networks

Question 6

Q

object recognition in the brain

Answer

A

when eyes focus on a scene ⇒ receive light of different wavelengths that has been reflected by the objects and surfaces in the scene
activates cells in each retina ⇒ grid of neurons in the back of the eye communicate their activation through the optic nerves and into brain, eventually activating neurons in the visual cortex, which resides in the back of the head
visual cortex is roughly organized as a hierarchical series of layers of neurons ⇒ neurons in each layer communicate their activations to neurons in the succeeding layer

Question 7

Q

object recognition

Answer

A

recognizing a particular group of pixels in an image as a particular object category (e.g. chair, dog, balloon etc.)

Question 8

Q

ConvNets

Answer

A

inspired by neocognitron

the driving force behind todays deep learning revolution

first proposed by Yann LeCunn in the 1980s

Question 9

Q

Object Recognition in CovNets

Answer

A

recognizing a particular group of pixels in an image as a particular object category (e.g. chair, dog, balloon etc.)

neurons in different layers of this hierarchy act as “detectors” that respond to increasingly complex features appearing in the visual scene

there is a bottom-up (or feed-forward) flow of information, representing connections from lower to higher layers

units in each layer provide input to units in the next layer

Question 10

Q

ConvNet input

Answer

A

image ⇒ array of numbers corresponding to the brightness and color of the image’s pixels

Question 11

Q

ConvNet output

Answer

A

network’s confidence (0 percent to 100 percent) for each category (“dog” and “cat”)

Question 12

Q

ConvNet goal

Answer

A

have the network learn to output a high confidence for the correct category and a low confidence for the other category

the network will learn what set of features of the input image are most useful for this task

Question 13

Q

activation maps

Answer

A

each layer of the network is represented by a set of three overlapping rectangles. These rectangles represent activation maps

units in a ConvNet act as detectors for important visual features

each unit looks for its designated feature in a specific part of the visual field

each layer in ConvNet consists of several grids of these units

each grid forms an activation map for a specific visual feature

Question 14

Q

edge detectors

Answer

A

each neuron only looks at part of the visual scene (its receptive field)

the neuron becomes active only if its receptive field contains a particular kind of edge (e.g. horizontal or vertical)

feed into higher level processing regions where neuron might detect certain shapes, objects or faces

Each unit in each map calculates an activation value that measures the degree to which the region “matches” the unit’s preferred edge orientation

Question 15

Q

receptive field of a unit

Answer

A

Each unit in a map corresponds to the analogous location in the input image, and each unit gets its input from a small region around that location —its receptive field.

(The receptive fields of neighboring units typically overlap.)

Question 16

Q

convolution

Answer

A

multiplying each value in a receptive field by its corresponding weight and summing the results

Image patches inside receptive fields are arrays of pixel values.
Each unit receives as input the pixel values in its receptive field. The unit then multiplies each input by its weight and sums the results to produce the unit’s activation

A key to the ConvNet’s success is that—again, inspired by the brain—these maps are hierarchical

Question 17

Q

classification module

Answer

A

layers 1 to 4 of network are called convolutional layers because each performs convolutions on the preceding layer

At this point, it’s time for the classification module to use these features to predict what object the image depicts.

The classification module is actually an entire traditional neural network

inputs to classification module are activation maps from the highest convolutional layer

output is a set of percentage values, one for each possible category, rating the network’s confidence that the input depicts an image of that category

Question 18

Q

explain what a convnet is

Answer

A

Inspired by hubel and wiesel’s findings on the brain’s visual cortex, a convnet takes an input image and transforms it – via convolutions – into a set of activation maps with increasingly compex features.

The features at the highest convolutional layers are fed into a traditional neural network (classification module), which outputs confidence percentages for the network’s known object categories.

The object category with the highest confidence is returned as the network’s classification of the image.

Question 19

Q

Training a convnet

Answer

A

in real-world ConvNets edge detectors aren’t built in.

Instead, ConvNets learn from training examples what features should be detected at each layer and how to set the weights in the classification module

just as in traditional neural networks, all the weights can be learned from data via back- propagation

Question 20

Q

training a convnet

Answer

A

 collect a training set

 label each image in the training set

 Your training program initially sets all the weights in the network to random values.

 one by one, each image is given as the input to the network; the network performs its layer- by-layer calculations and finally outputs confidence percentages for one of the categories

 For each image, your training program compares these output values to the “correct” values

 Then the training program uses the back-propagation algorithm to change the weights throughout the network just a bit, so that the next time this image is seen, the confidences will be closer to the correct values.

Question 21

Q

epoch

Answer

A

input an image, then calculate the error at the output, then change the weights

Training a ConvNet requires many epochs, during which the network processes each image over and over again