VAE Flashcards

Question 1

Q

VAE - How should one optimise the ELBO and what would happen to p(x) and D_KL?

Answer

A

To optimise the ELBO is to maximise it. It then most likely would maximise the log likelihood because it’s a lower bound of it.
It’s also likely that KL would decrease as optimising theta and phi would bring q close to p which would reduce the KL divergence.

Question 2

Q

VAE - What is the loss of p(x/z)?

Answer

A

The loss is the expectation over Etta of log(p(x/z)). The expectation is estimated as the average.

Question 3

Q

VAE - What happens to the loss of p(x/z) when we sample from a minibatch?

Answer

A

When sampling from a minibatch, we can estimate the average over the same x as just 1 sample, reducing computations.

Question 4

Q

VAE - How would one optimise the ELBO and what problem arises?

Answer

A

To optimise we’d derive and try and get to a minimum/maximum.
With respect to theta (decoder params) there’s no problem. With respect to phi (encoder params) we get an integral over z which is hard to estimate or to compute.

Question 5

Q

VAE - What is the reparametrisation trick

Answer

A

Converting z into a function of phi, x and a Gaussian(0,1)-Etta therefore making z deterministic. In practice, phi params determine μ and σ. After materialising Etta we can get z = etta*sigma + mu.

Question 6

Q

VAE - What are the 4 main principles that VAE is based on

Answer

A

1 AE framework
2 A Gaussian sampling procedure
3 The latent vector prior is Gaussian (0,1)
4 a reconstruction loss and a KL divergence loss

Question 7

Q

VAE - Why by design VAE either overfit or smooth the input?

Answer

A

Because the loss components of D KL and reconstruction loss have a tradeoff between them to either overfit (reconstruction) or to put latent vectors in the same space which makes a linear combination of the inputs.

Question 8

Q

AE - Why AE is not designed to generate new samples from the latent space?

Answer

A

Because there is no regularisation on the latent space which means that different points in latent space would produce unpredictable results.

Question 9

Q

VAE - What change over AE makes VAE generate meaningful new samples from the latent space?

Answer

A

VAE assumes that x was generated from a random process z. It uses VI framework to express log(p(x)) in terms of the ELBO and DKL. Then optimising the ELBO and DKL ensures meaningful interpretation in x from the random process z.

Question 10

Q

VAE - What is the assumption over the data that is able to be presented by latent vectors?

Answer

A

That the underlying representation of the data can be captured with less information than the representation that is being shown.

Question 11

Q

VAE - what is VAE designed to do concerning data generation?

Answer

A

It was designed to generate new data points that look like our data.

Question 12

Q

VAE - What is the definition of ‘Variational Inference’?

Answer

A

Approximate a distribution with another distribution.

Question 13

Q

VAE - What is ‘Amortized VI’?

Answer

A

The process of sharing the parameters of the approximation distribution with all the relevant data points rather than optimise the parameters for each point in the data.

Question 14

Q

VAE - Why was KL divergence chosen in the paper?

Answer

A

Because after the reparametrisation trick the ELBO can be expressed by the D_KL and E[log(p(x/z))]. They chose to express it like this because it has an analytic solution which can be easily written down with μ and σ.

Question 15

Q

VAE - If the loss was weighted with a high emphasis on reconstruction, what would happen?

Answer

A

We’d revert back to a sort of AE that reconstructs the examples very well but any sampling in the latent space in between groups would make no sense.

Question 16

Q

VAE - If the loss was weighted with a high emphasis on D KL, what would happen?

Answer

A

The posterior would be very close to N(0,1) and we’d get a blurred reconstruction of the examples whichever sample we choose.

Question 17

Q

VAE - What is the purpose of the reparameterization trick in VAEs?

Answer

A

To get the loss (that connects to log(p(x))) to a computed state which we can optimise.

Question 18

Q

VAE - What assumption does VAE make about data generation?

Answer

A

We assume that the data are generated by some random process, involving an unobserved continuous random variable z.

Question 19

Q

VAE - Give the initial 2 abstract steps of the data generation process

Answer

A

(1) a value z(i) is generated from some prior distribution pθ∗ (z); (2) a value x(i) is generated from some conditional distribution pθ∗ (x|z).