Lesson 4 - CONV: relevant architectures & components Flashcards
What is data augmentation?
Apply a set of operations on a given data sample to produce additional samples
Why do we want to do data augmentation?
To make the model more generic and make it more robust against changes in the input
–> increase training data
–> introduce variability
What operations can we do when doing data augmentation?
Mirroring, cropping, rotation, color shifting,…
When doing data augmentation, can I apply any random operation or can I apply an operation randomly?
No, you should still ensure that your “new” image respects the values of the old one. For example, if you have a picture of a tree and you crop it, you must be sure not to crop of the tree
What is dropout? What are its benefits?
With dropout we deactivate a neuron with a given probability
Benefits:
–> Avoid overfitting
–> Promote ensemble learning
What does it mean when you deactivate a neuron?
You set a neuron to 0 (output).
Setting a neuron to 0 will stop the information from being propagated further
How does dropout help to avoid overfitting the model
When canceling out some features, you force the model to learn based on other features
Why does dropout promotes ensemble learning?
–> by forcing the model to learn other features, you are learning to combine multiple features and that is where ensemble comes from
–> also by doing dropout, you still have sub-networks that survive
Considering relevant architectures, what is the Neocognitron (1982)
- goal: recognition of position-shifted / shape-distorted patterns
- proposed the cell-plane arrangement (convolution)
- hierarchical structure
- convolution/sub-sampling combination
Considering relevant architectures, what is LeNet-5 (1998)
- 7 layers: 3 conv, 2 subsampling, 2 FC
- addressed handwritten digit recognition task
- MNIST dataset was proposed
- one of the first use of ConvNets + Backprop
Considering relevant architectures, what is AlexNext (2012)
- 5 conv layers + 3 fc layers
- trained across 2 GPU’s (model parallelism)
- 60M param., 650K neurons
- No need to pair convolutional with pooling layers
- ReLU for convolutional layers
- data augmentation and dropout
What where the enablers of AlexNet?
- scientific community
- hardware developments
- open-access datasets
In 2014 we are going very deep with VGG-net. What do you know about it? How is it different?
- fixed-size 3x3 kernels
- use same conv. to preserve resolution
- trained by splitting data across 4 copies of the same model
–> data parallelism
VGG-net used stacked kernels and same convolutions. What are the benefits of that?
- smaller kernels = less parameters to estimate
- larger receptive field with less parameters
In 2014, GoogLeNet went even deeper.
- branching architecture
- aggregate the output of different branches [inception modules]