VQ-VAE Flashcards
What are the 3 main contributions of VQ-VAE in comparison to VAE
1 Quantisation of the latent space
2 Restrict the latent space to a set of vectors (codebook).
3 The prior of the codebook is learned rather than static
How does it changes the AE framework
Quantisation and restriction of the latent space to predefined codewords.
How does it changes the sampling procedure of VAE?
It assumes that the prior is a uniform distribution over the codebook and the posterior of the decoder input is a delta function that gives back the nearest codeword
What happens to the KL divergence
Becomes constant and is removed from the learning
What happens to the prior
It is learned in the training process
What is the loss?
The reconstruction loss, the codebook loss and the commitment loss
What is the codebook loss?
∣∣sg[Ze(x)]−ek∣∣2
What is the commitment loss?
β∣∣Ze(x)−sg[ek]∣∣2
Why is VQ-VAE more efficient than pixel-space autoregressive models when generating images?
Because it samples an autoregressive model only in the compressed latent space.
What is an autoregressive model?
An autoregressive model is a type of model where the output at each step depends on the input and previously generated outputs.
VQ-VAE-2 - How do they contribute in comparison to VQ-VAE?
They demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, can generate samples with quality that rivals GAN.