10_advanced deep learning concepts Flashcards

1
Q

By which two factors is the performance of a model limited?

A
  • architecture-driven limitations:
    limited model capacity
    improper model initialization
    appropriateness of architecture
    (inductive biases)
  • data-driven limitations:
    limited amount of data
    data quality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is meant by “data quality”?

A

1) appropriateness
(eg highly pixelated image would not be good for image classification task)

2) cleanliness
(how accurate was the labeling done? outliers in the dataset?)

3) generalizability
(are there domain shifts? greyscale training images are useless to train RGB models)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can we improve the data quality?

A
  • only use appropriate data
  • clean data
  • carefully check data for domain shifts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do we need data augmentation?

A

to increase the size of training dataset synthetically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kinds of data augmentation are there?

A
  • original
  • horizontal flip
  • vertical flip
  • contrast variations
  • image blocking

–> can also be combined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can models be pre-trained?

A

Through transfer learning

–> initialize model parameters with those from a model of the same architecture
that was previously trained on similar data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can the capacity of the model be improved?

A
  • deeper models have more layers than others
  • wider models have more neurons in a single layer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are issues when training large networks?

A

backpropagation gets more complicated for a large number of network layers:

  • gradients can vanish (eg with sigmoid function, large positive numbers go to zero)
  • gradients can explode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we avoid vanishing gradients?

A

batch normalization (BatchNorm) on every layer

take outputs and normalize them, scale before going through the activation function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we get rid of exploding gradients?

A

residual connections

only learn the residuals that typically have less extreme gradients –> learn the differences/delta gradients between expected outputs and the output of the layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is ResNets?

A

takes advantage of residual connections as well as BatchNorm

–> are very deep! up to 101 layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do most supervised tasks work?

A

discriminative (discriminate between different choices)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the U-Net build?

A

encoder-decoder = autoencoder architectures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Code (= bottleneck) between encoder and decoder layers?

A

goal: a meaningful representation of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is one way to perform representation learning?

A

autoencoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are autoencoders used for?

A
  • representation learning
  • data denoising
  • anomaly detection
17
Q

What can you do with decoders if they are trained successfully?

A

use them to generate data from noise

–> standalone decoders can be called generators

18
Q

What are adversarial attacks?

A

stack a barely visible rgb noise image on top of another image and the model should not be confused by this

19
Q

What does GAN stand for?

A

generative adversarial network

20
Q

What is the general idea behind a GAN?

A

have generator G create fake samples and try to trick discriminator D into thinking they are real samples

–> two-player minimax game

21
Q

How do GANs work? (steps)

A

1) D tries to maximize objective function (succeeds in identifying real samples) - pursues classification task

2) G tries to minimize objective function (succeeds i generating seemingly real samples)

Training: iterate between training D and G (with backdrop) until D says 50% chance of being real or fake

22
Q

How do diffusion models work?

A

the generator (now acting as an encoder) is trained to make sense of increasingly noisy data

–> turn highly noisy latent representations into realistic images

the latent representation is created by a large language model

23
Q

What is meant by the concept “attention” for CNNs?

A

which parts of the input data are important?

24
Q

What does attention in NLP enable?

A

enable each element of the input sequence to attend to any element of the output sequence

–> transformer models implement this attention mechanism

25
Q

What are some of the most important learning paradigms?(5)

A
  • supervised/unsupervised learning
  • transfer learning
  • semi-supervised/weakly supervised learning
  • self-supervised learning
  • continual learning
26
Q

What is transfer learning?

A

like supervised learning, but takes learned abilities from earlier tasks to the next

27
Q

What is semi-supervised learning?

A

combines a supervised learning process (with a small labeled dataset) with unsupervised methods

eg use clustering to label more data before training

28
Q

What is weakly supervised learning?

A

related to semi-supervised learning in that it learns to train based on weak labels eg noisy labels (add confidence score to a label)

29
Q

What is self-supervised learning?

A

getting data is becoming less & less expensive, but labelling it is still very expensive

–> goal is to learn basic representation that can be easily transferred to a given task

instead of learning a trivial task, the model is trained to decide whether two samples are identical or not

typically relies heavily on data augmentations

–> very efficient and time/money-saving to pretrain models!

30
Q

What is continual learning?

A

some models have to mitigate domain shifts in the data

–> one risk is catastrophic forgetting
can be minimized by experience replays!