Quiz 3 Flashcards
As you add more convolution + pooling layers, what do each pixel represent?
Each pixel of a deep layer represents a larger receptive field from a previous layer/input.
ImageNet
1.2 million images, 1000 classes.
Type of errors: Optimization error
Not find good weights to model a function
(Bad optimization algorithm)
Type of errors: Estimation error
Minimizing training error but doesn’t generalize to test set.
(Overfitting, learning features that don’t generalize well)
Type of errors: Modeling error
Given simple model, no set of weights can model the real world task.
Type of errors: Case study of multi-class logistic regression (MCLR) vs AlexNet
Which has high modeling error?
MCLR has high modeling error because model is very simple. Just can’t model complexity of real world.
Type of errors: Case study of multi-class logistic regression vs AlexNet
What kind of errors would AlexNet have, and why?
AlexNet may have smaller modeling error than MCLR but same degree of estimation error could occur.
Possibly higher optimization error because a complex architecture is harder to optimize.
Key idea of transfer learning
Reuse features learned on large dataset to learn new things
Describe transfer learning in 3 steps
- Train on large-scale dataset (may be provided for you)
- Take custom data and initialize the network with weights trained in step 1
- Continue to train on new dataset.
Limitations of transfer learning
Won’t work well if target task is very different (e.g. using pretrained model learned to classify natural image to sketches)
Benefit of transfer learning
Significantly reduces amount of labeled data needed to accomplish a task
Using a larger capacity model will always reduce estimation error
False. No regularization could lead to increasing estimation error.
Transfer learning: Example of what network changes you may need to make from a pretrained model to your own
Replace last layer with fully-connected for output nodes per new category
Transfer learning: Ways to train from pretrained model’s weights
- Update all parameters
- Freeze parts of the network (e.g. only tune fully connected layers)
Transfer learning: Why would you want to “freeze” parts of your network
Reduces the number of parameters that you need to learn given you new data set.
(If you don’t have enough data, you may not be able to fine-tune all the features in your network)
Transfer learning: T/F - If you have a large data set for a target domain, training from random initialization may result in faster convergence
True
Transfer learning: Expalin the three data regimes with respect to data set size and generalization error
- Small data region - not enough data, hard to reduce error
- Power-law region - training data size continues to linearly improve error
- Irreducible error region - useful data saturated to point of irreducible error
Modern networks: What was the key innovation introduced by AlexNet that made it a breakthrough in deep learning?
ReLU activation
Modern networks: Which one of these architectures is known for its simplicity with a focus on using only 3x3 convolutional filters?
VGGNet used 3x3 convolutional filters exclusively
Modern networks: Which architecture introduced the concept of residual learning, addressing the vanishing gradient problem and allowing the training of very deep networks?
ResNet introduced the concept of residual learning, where shortcut connections (or skip connections) were added to the network, allowing the gradient to flow more directly during training, thus addressing the vanishing gradient problem.
Modern networks: Which architecture uses inception modules? Explain what they are
InceptionNet.
Uses multiple filter sizes in parallel to capture different features
Modern networks: Which architecture was known for removing FC layers at the end of the network? What did it replace it with?
ResNet
Used global average pooling instead of FC layers. Global average pooling reduces overfitting and the total number of parameters in the network.
CNN: During forward propagation in a convolutional layer, what operation(s) is performed between the input and the kernel?
element-wise multiplication and summation
CNN: What is the purpose of backpropagation in the context of convolutional layers?
To compute the gradients for the kernel/filter
CNN: During backpropagation in a convolutional layer, what operation is performed to compute the gradients for the kernel?
Element-wise multiplication betwen gradients of the loss wrt output and input, then summed.
CNN: What is the purpose of padding in a CNN?
To preserve spatial dimensions. Otherwise deep layers becomes smaller and smaller.
CNN: Valid padding vs same padding
Valid: No padding, window always within input image
Same: Padding added to keep output size equal to input
CNN: Why use max-pooling
Reduces spatial dimensions through downsampling. Adds invariance to translation of features.
CNN: Invariance
Property where a model is robust to certain transformations in the input.
Practically, this explains how a CNN may be able to classify an object in an image regardless of where in the image it is located.
CNN: Equivariance
Property where a model can maintain the relationship between different elements after a transformation occurs (e.g. scaling, rotation, time shift)
CNN: How is invariance achieved by CNNs
Shared weights and bias
CNN: Equivariance
Convolution layers maintain spatial relationships between features.
E.g. If an image rotates, the convolution will also rotate.
CNN: CNN vs FC - which has higher memory usage
CNN