chapter 5 Flashcards
Yann Lecun
The inventor of ConvNets
he was fascinated by Rosenblatt’s perceptrons and Fukushima’s neocognitron, but noted that the latter lacked a good supervised-learning algorithm.
He combined ideas from the neocognitron with the back-propagation algorithm to create the semi-eponymous “LeNet”—one of the earliest ConvNets.
the PASCAL Visual Object Classes competition
The entries to the “classification” part of this contest were computer- vision programs that could take a photograph as input and could then output, for each of the twenty categories, whether an object of that category was present in the image.
The annual PASCAL competitions were a very big deal and did a lot to spur research in object recognition.
shortcomings of the PASCAL benchmark as a way to move computer vision forward
- Contestants were focusing too much on PASCAL’s specific twenty object categories
- there just weren’t enough photos in the data set for the competing systems to learn all the many possible variations in what the objects look like so as to be able to generalize well.
Fei-Fei Li
Based on the idea of WordNet
> a database of English words, arranged in a hierarchy moving from most specific to most general, with groupings among synonyms.
new idea: ImageNet
> create an image database that is structured according to the nouns in WordNet, where each noun is linked to a large number of images containing examples of that noun.
Amazon Mechanical Turk
Mechanical Turk is the embodiment of Marvin Minsky’s “Easy things are hard” dictum: the human workers are hired to perform the “easy” tasks that are currently too hard for computers.
ImageNet Large Scale Visual Recognition Challenge
The competitors were given labeled training images—1.2 million of them—and a list of possible categories.
The task for the trained programs was to output the correct category of each input image.
The ImageNet competition had a thousand possible categories, compared with PASCAL’s twenty.
The thousand possible categories were a subset of WordNet terms chosen by the organizers.
the “top- 5” accuracy metric
a program gets to guess five categories for each image, and if the correct one is in this list, the program is said to be correct on this image.
(ImageNet)
AlexNet
first convnet to win the ImageNet competition
eight layers, with about sixty million weights whose values were learned via back-propagation
85 percent correct
> vectore machines were 74% correct at most
Such a jump in accuracy was a shocking development.
data snooping
submit your program’s test-set answers to the test server and, based on the result, tweak your program. Then submit again
if you can do this enough times, it can be very effective in improving your program’s performance on the test set.
But because you’re using information from the test set to change your program, you’ve now destroyed the ability to use the test set to see if your program generalizes well
A cardinal rule in machine learning is “Don’t train on the test data.”. Why?
If you include test data in any part of training your program, you won’t get a good measure of the program’s generalization abilities.
the recent success of deep learning is due less to new breakthroughs in AI than to the availability of …
huge amounts of data (internet)
very fast parallel computer hardware.
claims about AI being more intelligent than humans, the claim comes with a few caveats:
- on ImageNet, correct identification means only that the correct category is in the machine’s top- five categories.
- saying “humans” is not quite accurate; this result is from an experiment involving a single human, one Andrej Karpathy
- if a ConvNet correctly says “dog,” how do we know it actually is basing this classification on the dog in the image? Maybe there’s something else in the image that was often associated with dogs in the training images, and the ConvNet is recognizing these and assuming there is a dog in the photo
Difference in errors made between humans and convnets
while they both get confused by images containing multiple objects, unlike humans, convnets tend to miss objects that are;
small in the image,
objects that have been distorted by color or contrast filters the photographer applied to the image,
and “abstract representations” of objects, such as a painting or statue of a dog, or a stuffed toy dog.