10_advanced deep learning concepts Flashcards
By which two factors is the performance of a model limited?
- architecture-driven limitations:
limited model capacity
improper model initialization
appropriateness of architecture
(inductive biases) - data-driven limitations:
limited amount of data
data quality
What is meant by “data quality”?
1) appropriateness
(eg highly pixelated image would not be good for image classification task)
2) cleanliness
(how accurate was the labeling done? outliers in the dataset?)
3) generalizability
(are there domain shifts? greyscale training images are useless to train RGB models)
How can we improve the data quality?
- only use appropriate data
- clean data
- carefully check data for domain shifts
Why do we need data augmentation?
to increase the size of training dataset synthetically
What kinds of data augmentation are there?
- original
- horizontal flip
- vertical flip
- contrast variations
- image blocking
–> can also be combined
How can models be pre-trained?
Through transfer learning
–> initialize model parameters with those from a model of the same architecture
that was previously trained on similar data
How can the capacity of the model be improved?
- deeper models have more layers than others
- wider models have more neurons in a single layer
What are issues when training large networks?
backpropagation gets more complicated for a large number of network layers:
- gradients can vanish (eg with sigmoid function, large positive numbers go to zero)
- gradients can explode
How can we avoid vanishing gradients?
batch normalization (BatchNorm) on every layer
take outputs and normalize them, scale before going through the activation function
How do we get rid of exploding gradients?
residual connections
only learn the residuals that typically have less extreme gradients –> learn the differences/delta gradients between expected outputs and the output of the layer
What is ResNets?
takes advantage of residual connections as well as BatchNorm
–> are very deep! up to 101 layers
How do most supervised tasks work?
discriminative (discriminate between different choices)
How is the U-Net build?
encoder-decoder = autoencoder architectures
What is the Code (= bottleneck) between encoder and decoder layers?
goal: a meaningful representation of the data
What is one way to perform representation learning?
autoencoder
What are autoencoders used for?
- representation learning
- data denoising
- anomaly detection
What can you do with decoders if they are trained successfully?
use them to generate data from noise
–> standalone decoders can be called generators
What are adversarial attacks?
stack a barely visible rgb noise image on top of another image and the model should not be confused by this
What does GAN stand for?
generative adversarial network
What is the general idea behind a GAN?
have generator G create fake samples and try to trick discriminator D into thinking they are real samples
–> two-player minimax game
How do GANs work? (steps)
1) D tries to maximize objective function (succeeds in identifying real samples) - pursues classification task
2) G tries to minimize objective function (succeeds i generating seemingly real samples)
Training: iterate between training D and G (with backdrop) until D says 50% chance of being real or fake
How do diffusion models work?
the generator (now acting as an encoder) is trained to make sense of increasingly noisy data
–> turn highly noisy latent representations into realistic images
the latent representation is created by a large language model
What is meant by the concept “attention” for CNNs?
which parts of the input data are important?
What does attention in NLP enable?
enable each element of the input sequence to attend to any element of the output sequence
–> transformer models implement this attention mechanism
What are some of the most important learning paradigms?(5)
- supervised/unsupervised learning
- transfer learning
- semi-supervised/weakly supervised learning
- self-supervised learning
- continual learning
What is transfer learning?
like supervised learning, but takes learned abilities from earlier tasks to the next
What is semi-supervised learning?
combines a supervised learning process (with a small labeled dataset) with unsupervised methods
eg use clustering to label more data before training
What is weakly supervised learning?
related to semi-supervised learning in that it learns to train based on weak labels eg noisy labels (add confidence score to a label)
What is self-supervised learning?
getting data is becoming less & less expensive, but labelling it is still very expensive
–> goal is to learn basic representation that can be easily transferred to a given task
instead of learning a trivial task, the model is trained to decide whether two samples are identical or not
typically relies heavily on data augmentations
–> very efficient and time/money-saving to pretrain models!
What is continual learning?
some models have to mitigate domain shifts in the data
–> one risk is catastrophic forgetting
can be minimized by experience replays!