Variational Inference Flashcards
The goal of Inference is to learn about latent (unknown) variables trough the posterior. Why is a analytic solution usually not an option?
The marginal integral is usually intractable.
Name some options for posterior inference
MCMC sampling
Laplace approximation
Expectation propagation
Variational inference
What is the main advantage of variational inference?
It is the most scalable method currently known.
What is the main idea behind variational inference?
Approximate the true posterior by defining a family of approximate distrubution q_v and optimizing the variational parameters v.
What is the KL (Kullback Leibler) divergence?
KL(p(x)||q(x)) =
integ p(x) log(p(x)/q(x)) dx =
E[log(p(x)/q(x))]
What is differential entropy?
H[q(x)]= -Eq[log q(x)]
How is the KL divergence often used in variational inference?
Use a KL(q(z) ||p(z|x)) as objective function to minimize
What does Jensens inequality state, and for what function do we often use this inequality?
For concave functions f:
f(E[x]) >= E[f(x)]
This is often used for logarithmes,
log E[x] >= E[log(x)]
What can we do instead of minimizing KL(q(z) || p(z|x))?
We can maximise ELBO (Evidence Lower Bound): Eq[log p(x|z)] - KL(q(z) ||p(z)) = Eq[log p(x,z)] + H[log q(z|v)] = Eq[log p(x,z)] - Eq[log q(z|v)] = Eq[log p(x,z) / q(z|v)]
What is a mean field approximation?
In a mean field approximation q(z|x) is fully factorized, meaning q(z|x) = prod q(z_i). The resulting distribution is with global parameters beta, and local paramters z_i is given by:
q(b, z|v) = q(B | lambda) prod (z_n | phi_n) with
v = [lambda, phi_1, phi_2…., phi_n]
In the mean field approximation the q-factors don’t depend directly on the data. How is the family of q’s connected to the data?
Trought the maximization of ELBO.
What is the algorithm for mean field approximation?
- Initialize parameters randomly
- Update local variational paramters
- Update global variational paramters
- Repeat.
What are the limitations of mean field approximation?
Generally mean field tend to be too compact, need a better class for approximation
Classical mean field approximation has to evaluate all datapoints to update parameters, making it unscaleable to large datasets. How can we leaviate this problem?
Use stochastic variational inference, updating the parameters with a stochastic subset of the data.
How do we maximize the ELBO?
Set q*(z_i) = exp(E{-i}[p(x,z)])