VAE Flashcards

1
Q

How should one optimise the ELBO and what would happen to p(x) and D_KL?

A

To optimise the ELBO is to maximise it. It then most likely would maximise the log likelihood because it’s a lower bound of it.
It’s also likely that KL would decrease as optimising theta and phi would bring q close to p which would reduce the KL divergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the loss of p(x/z)?

A

The loss is the expectation over Etta of log(p(x/z)). The expectation is estimated as the average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens to the loss of p(x/z) when we sample from a minibatch?

A

When sampling from a minibatch, we can estimate the average over the same x as just 1 sample, reducing computations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How would one optimise the ELBO and what problem arises?

A

To optimise we’d derive and try and get to a minimum/maximum.
With respect to theta (decoder params) there’s no problem. With respect to phi (encoder params) we get an integral over z which is hard to estimate or to compute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the reparametrisation trick

A

Converting z into a function of phi, x and a Gaussian(0,1)-Etta therefore making z deterministic. In practice, phi params determine μ and σ. After materialising Etta we can get z = etta*sigma + mu.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 4 main principles that VAE is based on

A

1 AE framework
2 A Gaussian sampling procedure
3 The latent vector prior is Gaussian (0,1)
4 a reconstruction loss and a KL divergence loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why by design VAE either overfit or smooth the input?

A

Because the loss components of D KL and reconstruction loss have a tradeoff between them to either overfit (reconstruction) or to put latent vectors in the same space which makes a linear combination of the inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

AE - Why AE is not designed to generate new samples from the latent space?

A

Because there is no regularisation on the latent space which means that different points in latent space would produce unpredictable results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What change over AE makes VAE generate meaningful new samples from the latent space?

A

VAE assumes that x was generated from a random process z. It uses VI framework to express log(p(x)) in terms of the ELBO and DKL. Then optimising the ELBO and DKL ensures meaningful interpretation in x from the random process z.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the assumption over the data that is able to be presented by latent vectors?

A

That the underlying representation of the data can be captured with less information than the representation that is being shown.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is VAE designed to do concerning data generation?

A

It was designed to generate new data points that look like our data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the definition of ‘Variational Inference’?

A

Approximate a distribution with another distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is ‘Amortized VI’?

A

The process of sharing the parameters of the approximation distribution with all the relevant data points rather than optimise the parameters for each point in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why was KL divergence chosen in the paper?

A

Because after the reparametrisation trick the ELBO can be expressed by the D_KL and E[log(p(x/z))]. They chose to express it like this because it has an analytic solution which can be easily written down with μ and σ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If the loss was weighted with a high emphasis on reconstruction, what would happen?

A

We’d revert back to a sort of AE that reconstructs the examples very well but any sampling in the latent space in between groups would make no sense.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If the loss was weighted with a high emphasis on D KL, what would happen?

A

The posterior would be very close to N(0,1) and we’d get a blurred reconstruction of the examples whichever sample we choose.

17
Q

What is the purpose of the reparameterization trick in VAEs?

A

To get the loss (that connects to log(p(x))) to a computed state which we can optimise.

18
Q

What assumption does VAE make about data generation?

A

We assume that the data are generated by some random process, involving an unobserved continuous random variable z.

19
Q

Give the initial 2 abstract steps of the data generation process

A

(1) a value z(i) is generated from some prior distribution pθ∗ (z); (2) a value x(i) is generated from some conditional distribution pθ∗ (x|z).