Autoencoders Flashcards

Question 1

Q

What are the variants of auto encoders?

Answer

A

– AE and PCA
– Denoising
– Sparse
– Variational (VAE)
– Convolutional, recurrent

Question 2

Q

Are auto encoders data-specific? What does it mean?

Answer

A

Yes, for example an autoencoder trained on pictures of faces would do a rather poor job of compressing pictures of trees, because the features it would learn would be face-specific

Question 3

Q

What are the 3 most important components of an autoencoder?

Answer

A

An encoding function, a decoding function, and loss function.

Loss function is a distance function between the amount of information loss between the compressed representation of your data and the decompressed representation (i.e. a “loss” function).

Question 4

Q

Give few applications of an auto encoder

Answer

A

It can be used for data compression, dimensionality reduction,
visualization, anomaly detection, data generator(!)

Question 5

Q

How can an auto encoder be used for dimensionality reduction?

Answer

A

For 2D visualization specifically, t-SNE (pronounced “tee-snee”) is probably the best algorithm around, but it typically requires relatively low-dimensional data.

So a good strategy for visualizing similarity relationships in high-dimensional data is to start by using an autoencoder to compress your data into a low-dimensional space (e.g. 32-dimensional), then use t-SNE for mapping the compressed data to a 2D plane.

Question 6

Q

What’s the analog for decoder and encoder in bigger networks?

Answer

A

The same for “bigger networks” with “coder” and “decoder” parts being a
convolutional or recurrent networks

Question 7

Q

How does a denoising auto encoder work?

Answer

A

Given a set of “clean” images add noise to them and
train an autoencoder on:

– Input: a noisy image
– Output: a clear image

• In this way the network “learns” what is essential in
an image and what is not!

Question 8

Q

What’s the key idea behind a sparse auto encoder?

Answer

A

The bottleneck layer computes some “essential features”
that are needed for input reconstruction.
• Usually, only a few features are really important;
the rest represent noise.
• => impose some constraints on the bottleneck layer:
e.g., “on average, activations are close to 0”.

Question 9

Q

What is the Kullback-Leibler divergence?

Answer

A

• => Kullback-Leibler divergence: measures the distance
between two probability distributions;
used as a part of the loss function.

Question 10

Q

What’s the key idea behind variational auto encoders?

Answer

A

The bottleneck layer represents a latent space:
vectors of features that are needed for reconstruction
• “to reconstruct an output we can slightly vary the values
of the these essential features”
• Assume that the latent features are “normally distributed”
=> Extend the bottleneck layer to include “means” and “std’s”.

Question 11

Q

What’s a disadvantage of discriminative models?

Answer

A

Discriminative models have several key limitations
• Can’t model P(X), i.e. the probability of seeing a certain image
• Thus, can’t sample from P(X), i.e. can’t generate new images

Question 12

Q

What’s a key advantage of generative models over discriminative ones?

Answer

A

Generative models (in general) :
• Can model P(X), probability of seeing a certain image
• Can generate new images

Question 13

Q

How do GANs work?

Answer

A

Generator: generate fake samples, tries to fool the Discriminator
Discriminator: tries to distinguish between real and fake samples
Train them against each other
Repeat this and we get better Generator and Discriminator

The Discriminator is trying to maximize its reward and the Generator is trying to minimize Discriminator’s reward (or maximize its loss)

Question 14

Q

What are the key ideas behind Deep Convolutional GANs?

Answer

A

Key ideas:

Replace FC hidden layers with Convolutions
Generator: Fractional-Strided convolutions
Use Batch Normalization after each layer
Inside Generator
Use ReLU for hidden layers
Use Tanh for the output layer

Question 15

Q

Why should we use GANs?

Answer

A

Why GANs?
• Sampling (or generation) is straightforward.
• Training doesn’t involve Maximum Likelihood estimation.
• Robust to Overfitting since Generator never sees the training data.
• Empirically, GANs are good at capturing the modes of the distribution.

Question 16

Q

What are some problems with GANs?

Answer

A

Probability Distribution is Implicit
Not straightforward to compute P(X).
Thus Vanilla GANs are only good for Sampling/Generation.
Training is Hard
Non-Convergence
Mode-Collapse

Question 17

Q

How do GANs work?

Answer

A

• GANs are generative models that are implemented using two
stochastic neural network modules: Generator and Discriminator.

• Generator tries to generate samples from random noise as input

• Discriminator tries to distinguish the samples from Generator and
samples from the real data distribution.

• Both networks are trained adversarially (in tandem) to fool the other
component. In this process, both models become better at their
respective tasks.

Question 18

Q

How do Laplacian Pyramid of adversarial networks works?

Answer

A

Generate high resolution (dimension) images by using a hierarchical system of GANs– Iteratively increase image resolution and quality.

• Generator 𝐺 generates the base image 𝐼 from random noise input 𝑧 .

• Generators (𝐺J,𝐺I,𝐺 ) iteratively generate the difference image (ℎ  ) conditioned on previous
small image (𝑙).

• This difference image is added to an up-scaled version of previous smaller image.

Question 19

Q

Example of coupled GAN

Answer

A

Different features: hair color, eyes, etc. joint probability

Learning a joint distribution of multi-domain images.
• Using GANs to learn the joint distribution with samples drawn from
the marginal distributions.
• Direct applications in domain adaptation and image translation

Question 20

Q

What are some advanced GAN extensions?

Answer

A

Coupled GAN
LAPGAN – Laplacian Pyramid of Adversarial Networks
Adversarially Learned Inference

Question 21

Q

How do conditional GANs work?

Answer

A

• Differentiating Feature: Uses an Identity Preservation Optimization using an
auxiliary network to get a better approximation of the latent code (z*) for an
input image.
• Latent code is then conditioned on a discrete (one-hot) embedding of age
categories.

Question 22

Q

Energy based GANs

Question 23

Q

What are 2 problems with GANs and how do we solve them?

Answer

A

-> Non-Convergence
-> Mode-Collapse

Solutions

-> Mini-Batch GANs
-> Supervision with labels

Question 24

Q

Deep learning as well as GANs use SGD, what’s the difference?

Answer

A

DL: SGD has convergence guarantees (under certain conditions). Problem: With non-convexity, we might converge to local optima.

GANs: SGD was not designed to find the Nash equilibrium of a game. Problem: We might not converge to the Nash equilibrium at all.