chapter 4 Flashcards
What’s so hard about object recognition? (dogs)
If the input is simply the pixels of the image, then the program first has to figure out which are “dog” pixels and which are “non-dog” pixels
different dogs look very different
they can be facing in various directions
the lighting can vary considerably between images;
parts of the dog can be blocked by other objects
dog pixels” might look a lot like “cat pixels” or other animals.
Under some lighting conditions, a cloud in the sky might even look very much like a dog.
The deep-learning revolution
The ability of machines to recognize objects in images and videos underwent a quantum leap in the 2010s due to advances in the area called deep learning.
deep learning
methods for training deep neural networks
most successful deep networks are those …
whose structure mimics parts of the brain’s visual system.
Fukushima - cognitron and neocognitron
inspired by David Hubel and Torsten Wiesel’s discoveries of hierarchical organization in visual systems
Fukushima reported some success training the neocognitron to recognize handwritten digits, but the specific learning methods he used did not seem to extend to more complex visual tasks.
the neocognitron was an important inspiration for later approaches to deep neural networks, including today’s most influential and widely used approach: convolutional neural networks
object recognition in the brain
- when eyes focus on a scene ⇒ receive light of different wavelengths that has been reflected by the objects and surfaces in the scene
- activates cells in each retina ⇒ grid of neurons in the back of the eye communicate their activation through the optic nerves and into brain, eventually activating neurons in the visual cortex, which resides in the back of the head
- visual cortex is roughly organized as a hierarchical series of layers of neurons ⇒ neurons in each layer communicate their activations to neurons in the succeeding layer
object recognition
recognizing a particular group of pixels in an image as a particular object category (e.g. chair, dog, balloon etc.)
ConvNets
inspired by neocognitron
the driving force behind todays deep learning revolution
first proposed by Yann LeCunn in the 1980s
Object Recognition in CovNets
recognizing a particular group of pixels in an image as a particular object category (e.g. chair, dog, balloon etc.)
neurons in different layers of this hierarchy act as “detectors” that respond to increasingly complex features appearing in the visual scene
there is a bottom-up (or feed-forward) flow of information, representing connections from lower to higher layers
units in each layer provide input to units in the next layer
ConvNet input
image ⇒ array of numbers corresponding to the brightness and color of the image’s pixels
ConvNet output
network’s confidence (0 percent to 100 percent) for each category (“dog” and “cat”)
ConvNet goal
have the network learn to output a high confidence for the correct category and a low confidence for the other category
the network will learn what set of features of the input image are most useful for this task
activation maps
each layer of the network is represented by a set of three overlapping rectangles. These rectangles represent activation maps
units in a ConvNet act as detectors for important visual features
each unit looks for its designated feature in a specific part of the visual field
each layer in ConvNet consists of several grids of these units
each grid forms an activation map for a specific visual feature
edge detectors
each neuron only looks at part of the visual scene (its receptive field)
the neuron becomes active only if its receptive field contains a particular kind of edge (e.g. horizontal or vertical)
feed into higher level processing regions where neuron might detect certain shapes, objects or faces
Each unit in each map calculates an activation value that measures the degree to which the region “matches” the unit’s preferred edge orientation
receptive field of a unit
Each unit in a map corresponds to the analogous location in the input image, and each unit gets its input from a small region around that location —its receptive field.
(The receptive fields of neighboring units typically overlap.)