6_Variational Inference Flashcards

1
Q

ELBO (evidence lower bound; marginal likelihood lower bound)

A

F(v) := E_q[log( p(x,z) / q(z) )]

F(v) = E_q[log( p(x|y) )] - KL(q(z) || p(y))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Score Function

A

score function (log derivative trick)

\nabla_v log(q(z|v)) = \nabla_v q(z|v) / q(z|v)

\Leftrightarrow \nabla_v q(z|v) = \nabla_v log(q(z|v)) * q(z|v)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Natural Gradient

A

\tilde{\nabla}_v F(v) = F^{-1} \nabla_v F(v)

F^{-1} = (Fisher information matrix)^{-1}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Noisy Updates of Variational Parameters

A

v_{t+1} = v_t + \rho_t \hat{\nabla}_v F(v)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

\nabla_v ELBO using the score function

A

\nabla_v F(v) = E_q[\nabla_v log(q(z|v)) * ( log(p(x,z) - log(q(z|v) )]

Use Monte Carlo to compute this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Change of variables

A

|q(z|v) dz| = |p(\varepsilon) d\varepsilon|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reparametrisation trick

A

Base distribution p(\varepsilon) [normal or uniform] and a deterministic transformation z = t(\varepsilon, v) s.t. z~q(z|v). Then:

\nabla_v E_{q(z|v)}[f(z)] = E_{p(\varepsilon)}[\nabla_v f(t(\varepsilon, v))]

Note, we take the expectation w.r.t. base distribution now.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Reparametrisation ELBO gradient

A

\nabla_v F(v) = E_{p(\varepsilon)}[\nabla_v * ( log(p(x,t(\varepsilon, v)) - log(q(t(\varepsilon, v)|v) )]

\nabla_v F(v) = E_{p(\varepsilon)}[\nabla_z * ( log(p(x,z) - log(q(z|v) ) * \nabla_v t(\varepsilon, v)]

where z = t(\varepsilon, v)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the score function ELBO gradient properties

A
\+ Works for all models (continuous and discrete)
\+ Works for a large class of variational approximations
- Variance can be high, thus, slow convergence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the path wise gradient estimator ELBO gradient properties

A
  • Requires differentiable models
  • Requires the variational approximation to be expressed as a deterministic transformation z = t(\varepsilon, v)
    + Generally lower variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Amortised variational inference in hierarchical Bayesian models

A

F(v) = E_q[log(p(x,\beta,z_{1:N}) - log(q(\beta, z_{1:N} | \lambda, \phi_{1:N})],
where v = {\lambda, \phi_{1:N}}

F(v) = E_q[log(p(x,\beta,z_{1:N})]
- E_q[log(q(\beta|\lambda) + \sum_n log(q(z_n| f(x_n, \theta) )],
where \phi_n = f(x_n, \theta), f is a deep neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Amortised SVI (Algorithm)

A
  1. Input: data x, model p(\beta, z, x)
  2. Initialise global variational parameters \lambda randomly
  3. Repeat:
    3.1 Sample \beta ~ q(\beta | \labmda)
    3.2 Sample data point x_n uniformly at random
    3.3 Compute stochastic natural gradients
    »> \tilde\nabla_\lambda ELBO
    »> \tilde\nabla_\theta ELBO
    3.4 Update global parameters
    »> \lambda += \rho_t \tilde\nabla_\lambda ELBO [global variational parameters]
    »> \theta += \rho_t \tilde\nabla_\theta ELBO [inference network parameters]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

BBVI (Algorithm) [black box variational inference]

A
  1. Input: model p(x, z), variational approximation q(z|v)
  2. Repeat:
    2.1 Draw S samples z^{(S)} ~q(z|v)
    2.2 Update variational parameters [MC estimate of the score-function gradient of the ELBO]
    »> v += \rho_t * 1/S \sum_s \nabla_v log(q(z^{(s)} | v) * (log(p(x, z^{(s)}) - log(q(z^{(s)} | v)) )
    2.3 t += 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SVI (Algorithm) [stochastic variational inference]

A
  1. Input: data x, model p(\beta, z, x)
  2. Initialise global variational parameters \lambda randomly
  3. Repeat:
    3.1 Sample data point x_n uniformly at random
    3.2 Update local parameter \phi_n
    3.3 Compute intermediate global parameter \hat\lambda based on noisy natural gradient
    3.4 Set global parameter
    »> \lambda = (1-\rho_t) * \lambda + \rho_t * \hat\lambda
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mean-Field Approximation (Algorithm) [CW3]

A
  1. Input: data x, model p(\beta, z, x)
  2. Initialise global variational parameters \lambda randomly
  3. While ELBO has not converged, repeat:
  4. 1 For each data point x_n
  5. 1.1 Update local variational parameters \phi_n
  6. 2 Update global variational parameters \lambda
How well did you know this?
1
Not at all
2
3
4
5
Perfectly