10_advanced deep learning concepts Flashcards
By which two factors is the performance of a model limited?
- architecture-driven limitations:
limited model capacity
improper model initialization
appropriateness of architecture
(inductive biases) - data-driven limitations:
limited amount of data
data quality
What is meant by “data quality”?
1) appropriateness
(eg highly pixelated image would not be good for image classification task)
2) cleanliness
(how accurate was the labeling done? outliers in the dataset?)
3) generalizability
(are there domain shifts? greyscale training images are useless to train RGB models)
How can we improve the data quality?
- only use appropriate data
- clean data
- carefully check data for domain shifts
Why do we need data augmentation?
to increase the size of training dataset synthetically
What kinds of data augmentation are there?
- original
- horizontal flip
- vertical flip
- contrast variations
- image blocking
–> can also be combined
How can models be pre-trained?
Through transfer learning
–> initialize model parameters with those from a model of the same architecture
that was previously trained on similar data
How can the capacity of the model be improved?
- deeper models have more layers than others
- wider models have more neurons in a single layer
What are issues when training large networks?
backpropagation gets more complicated for a large number of network layers:
- gradients can vanish (eg with sigmoid function, large positive numbers go to zero)
- gradients can explode
How can we avoid vanishing gradients?
batch normalization (BatchNorm) on every layer
take outputs and normalize them, scale before going through the activation function
How do we get rid of exploding gradients?
residual connections
only learn the residuals that typically have less extreme gradients –> learn the differences/delta gradients between expected outputs and the output of the layer
What is ResNets?
takes advantage of residual connections as well as BatchNorm
–> are very deep! up to 101 layers
How do most supervised tasks work?
discriminative (discriminate between different choices)
How is the U-Net build?
encoder-decoder = autoencoder architectures
What is the Code (= bottleneck) between encoder and decoder layers?
goal: a meaningful representation of the data
What is one way to perform representation learning?
autoencoder