Quiz 3 Flashcards
Optimization error
Even if your neural network can perfectly model the real world, your optimization algorithm may not be able to find the weights that model that function.
Estimation error
Even if we do find the best hypothesis, this best set of weights or parameters for the neural network, that doesn’t mean that you will be able to generalize to the testing set.
Modeling error
Given a particular neural network architecture, your actual model that represents the real world might not be in that space.
Optimization error scenario
You were lazy and couldn’t/didn’t optimize to completion
Estimation error scenario
You tried to learn model with finite data
Modeling error scenario
You approximated (didn’t?) reality with model
More complex models lead to ______ modeling error
Smaller
Transfer learning steps
1) Train on large-scale dataset
2) Take your custom data and initialize the network with weights trained in Step 1
3) Continue training on new dataset
Ways to apply transfer learning
Finetune, freeze
Finetune
Update all parameters
Freeze (feature layer)
Update only last layer weights (used when not enough data)
When is transfer learning less effective?
If the source dataset you train on is very different from the target dataset
As we add more data, what do we see in generalization error?
Error continues to get smaller / accuracy continues to improve
LeNet architecture
two sets of convolutional, activation, and pooling layers, followed by a fully-connected layer, activation, another fully-connected, and finally a softmax classifier
LeNet architecture shorthand
INPUT => CONV => RELU => POOL => CONV => RELU => POOL => FC => RELU => FC
LeNet architecture good for
Number classification / MNIST dataset
AlexNet architecture
Eight layers with learnable parameters. The model consists of five layers with a combination of max pooling followed by 3 fully connected layers and they use Relu activation in each of these layers except the output layer.
AlexNet activation function
ReLU instead of sigmoid or tanh. The first to do so
AlexNet data augmentation
PCA-based (principle component analysis)
AlexNet regularization
Dropout (the first to use)
AlexNet used ______ to combine predictions from multiple networks
Ensembling
VGGNet used ____ modules / blocks of layers
repeated
All convolutions in VGG are ___
3 x 3
Most memory usage in VGG is in
Convolution layers
Most of the parameters in VGG are
In the fully connected layer