Autoencoders Flashcards

1
Q

What are the variants of auto encoders?

A
– AE and PCA
– Denoising
– Sparse
– Variational (VAE)
– Convolutional, recurrent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Are auto encoders data-specific? What does it mean?

A

Yes, for example an autoencoder trained on pictures of faces would do a rather poor job of compressing pictures of trees, because the features it would learn would be face-specific

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 most important components of an autoencoder?

A

An encoding function, a decoding function, and loss function.

Loss function is a distance function between the amount of information loss between the compressed representation of your data and the decompressed representation (i.e. a “loss” function).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give few applications of an auto encoder

A

It can be used for data compression, dimensionality reduction,
visualization, anomaly detection, data generator(!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can an auto encoder be used for dimensionality reduction?

A

For 2D visualization specifically, t-SNE (pronounced “tee-snee”) is probably the best algorithm around, but it typically requires relatively low-dimensional data.

So a good strategy for visualizing similarity relationships in high-dimensional data is to start by using an autoencoder to compress your data into a low-dimensional space (e.g. 32-dimensional), then use t-SNE for mapping the compressed data to a 2D plane.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s the analog for decoder and encoder in bigger networks?

A

The same for “bigger networks” with “coder” and “decoder” parts being a
convolutional or recurrent networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does a denoising auto encoder work?

A

Given a set of “clean” images add noise to them and
train an autoencoder on:

– Input: a noisy image
– Output: a clear image

• In this way the network “learns” what is essential in
an image and what is not!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What’s the key idea behind a sparse auto encoder?

A

The bottleneck layer computes some “essential features”
that are needed for input reconstruction.
• Usually, only a few features are really important;
the rest represent noise.
• => impose some constraints on the bottleneck layer:
e.g., “on average, activations are close to 0”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Kullback-Leibler divergence?

A

• => Kullback-Leibler divergence: measures the distance
between two probability distributions;
used as a part of the loss function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s the key idea behind variational auto encoders?

A

The bottleneck layer represents a latent space:
vectors of features that are needed for reconstruction
• “to reconstruct an output we can slightly vary the values
of the these essential features”
• Assume that the latent features are “normally distributed”
=> Extend the bottleneck layer to include “means” and “std’s”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s a disadvantage of discriminative models?

A

Discriminative models have several key limitations
• Can’t model P(X), i.e. the probability of seeing a certain image
• Thus, can’t sample from P(X), i.e. can’t generate new images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What’s a key advantage of generative models over discriminative ones?

A
Generative models (in general) :
• Can model P(X), probability of seeing a certain image
• Can generate new images
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do GANs work?

A
  • Generator: generate fake samples, tries to fool the Discriminator
  • Discriminator: tries to distinguish between real and fake samples
  • Train them against each other
  • Repeat this and we get better Generator and Discriminator

The Discriminator is trying to maximize its reward and the Generator is trying to minimize Discriminator’s reward (or maximize its loss)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the key ideas behind Deep Convolutional GANs?

A

Key ideas:

  • Replace FC hidden layers with Convolutions
  • Generator: Fractional-Strided convolutions
  • Use Batch Normalization after each layer
  • Inside Generator
  • Use ReLU for hidden layers
  • Use Tanh for the output layer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why should we use GANs?

A

Why GANs?
• Sampling (or generation) is straightforward.
• Training doesn’t involve Maximum Likelihood estimation.
• Robust to Overfitting since Generator never sees the training data.
• Empirically, GANs are good at capturing the modes of the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some problems with GANs?

A
  • Probability Distribution is Implicit
  • Not straightforward to compute P(X).
  • Thus Vanilla GANs are only good for Sampling/Generation.
  • Training is Hard
  • Non-Convergence
  • Mode-Collapse
17
Q

How do GANs work?

A

• GANs are generative models that are implemented using two
stochastic neural network modules: Generator and Discriminator.

• Generator tries to generate samples from random noise as input

• Discriminator tries to distinguish the samples from Generator and
samples from the real data distribution.

• Both networks are trained adversarially (in tandem) to fool the other
component. In this process, both models become better at their
respective tasks.

18
Q

How do Laplacian Pyramid of adversarial networks works?

A

Generate high resolution (dimension) images by using a hierarchical system of GANs– Iteratively increase image resolution and quality.

• Generator 𝐺  generates the base image 𝐼    from random noise input 𝑧 .

• Generators (𝐺J,𝐺I,𝐺 ) iteratively generate the difference image (ℎ  ) conditioned on previous
small image (𝑙).

• This difference image is added to an up-scaled version of previous smaller image.

19
Q

Example of coupled GAN

A

Different features: hair color, eyes, etc. joint probability

Learning a joint distribution of multi-domain images.
• Using GANs to learn the joint distribution with samples drawn from
the marginal distributions.
• Direct applications in domain adaptation and image translation

20
Q

What are some advanced GAN extensions?

A
  • Coupled GAN
  • LAPGAN – Laplacian Pyramid of Adversarial Networks
  • Adversarially Learned Inference
21
Q

How do conditional GANs work?

A

• Differentiating Feature: Uses an Identity Preservation Optimization using an
auxiliary network to get a better approximation of the latent code (z*) for an
input image.
• Latent code is then conditioned on a discrete (one-hot) embedding of age
categories.

22
Q

Energy based GANs

23
Q

What are 2 problems with GANs and how do we solve them?

A
  • -> Non-Convergence
  • -> Mode-Collapse

Solutions

  • -> Mini-Batch GANs
  • -> Supervision with labels
24
Q

Deep learning as well as GANs use SGD, what’s the difference?

A

DL: SGD has convergence guarantees (under certain conditions). Problem: With non-convexity, we might converge to local optima.

GANs: SGD was not designed to find the Nash equilibrium of a game. Problem: We might not converge to the Nash equilibrium at all.