Sampling Flashcards

1
Q

What are the two problems we often want to solve with sampling using Monte Carlo methods?

A
  1. Generate samples for simulation (generative models) or represantion of distrubutions
  2. Compute expectations (solve integrals), e.g. mean/variance, marginal likelihood, prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can we approximate expectation of f(x) using monte carlo?

A

Let x_s be samples, then

E(f(x)) = integ f(x)*p(x) dx = approx (1/S) sum f(x_s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can we approximate predictions using monte carlo sampling?

A

Let theta_s be samples, then:
p(y|x) = integ p(y|x,theta) p(theta) d theta = approx
(1/S) sum p(y|x,theta_s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Is the monte carlo estimator for expectation biased?

A

No it is unbiased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens to the variance of the monte carlo estimator for expectation when the number of samples, S, increases?

A

It decreases by a factor of 1/S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we use monte carlo to sample discrete values?

A

Sample from U(0,1), assign the probability for different outcomes to different intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the rejection sampling algorithm?

A
  1. Assume that sampling from p(x) is difficult, but evaluating p_hat(x) = Zp(x) is easy.
  2. Estimate the complex distribution p_hat(x) using a simple distrubution q(x)
  3. Calculate a k such that kq(x) > p_hat(x)
  4. Draw sample, x_s from q(x)
  5. Draw sample u_0 from uniform = U[0, kq(x_s)]
  6. Reject if u_0 > p_hat(x_s), else retain.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the problems of rejection sampling?

A
  1. k might be difficult to estimate

2. We might reject most samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the idea behind importance sampling?

A

Instead of throwing away samples, we weigh them by w_s = p(x_s)/q(x_s). So Ep( f(x) ) = Eq( f(x) * w_s) = approx 1/S sum f(x_s) w_s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is importance sampling unbiased?

A

Yes, if q > 0 when p > 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the dissadvantages of importance sampling?

A
  1. Breaks down if we do not have enough samples (puts almost all weight on a single sample)
  2. Requires evaluating true p
  3. Does not scale to high-dimensional problems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the idea behind MCMC (Markov Chain Monte Carlo)?

A

Instead of generating independent samples, use a proposal function q that depends on the previous sample.

p(x_t+1|x_t) = T(x_t+1|x_t) where T is called a transition operator. Example:

T(x_t+1|x_t) = N(X_t+1 | x_t, sigma^2*I)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name a necesary property for the MCMC transition operator.

A

The markov property, so

p(x_t+1|x_t, x_t-1…. x_0) = p(x_t+1|x_t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the properties of the samples from MCMC?

A

They are not independent, but they are unbiased, so we can take the average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the four different ways a markov chain might behave?

A

1) Diverge
2) Converge to a single point
3) Conberge to a limited cycle
4) Converge to an equilibrium distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What kind of behaviour do we want from our MCMC markov chain?

A

Converge to an equilibrium distribution

17
Q

How can we reduce the dependency of our samples?

A

Discard most of the samples, keeping only every M’th sample.

18
Q

What are the two conditions for converging to a equilibrium distrubution?

A
  1. Invariance/ Stationarity,
    If we run the chain a long enough time, and we are in the equilibrium distribution, we stay in the equilibrium distrubution
  2. Ergodicity
    Any state can be reached from anywhere.

These conditions ensure that we only have one equilibirum distribution.

19
Q

Name a sufficient condition for p* beeing invariant

A

Detailed balance, p(x)T(x’|x) = p(x’)T(x|x’)

20
Q

What is the Metropolis-Hasting algorithm?

A

Assume that p_hat = Zp can be evaluated easily.

  1. Generate proposal x’ ~ q(x’|x_t)
  2. If q(x_t |x’)p_hat(x’) / q(x’|x_t) p_hat(x_t) > u where u~U[0,1] accept sample, so x_t+1 = x’, else set x_t+1 = x_t
21
Q

Let q(x’|x_t)~N(x’|x_t, sigma^2) and let sigma be the step size. What happpens with

1) Large step sizes
2) Small step sizes

A
  1. The method won’t work, we might reject too much

2. We might not explore enough.

22
Q

What distributions do we need for Gibbs?

A

We need the conditional distributions for each paramter.

23
Q

What is the requirement for Ergodicity (Any state can be reached from any point) in Gibbs sampling?

A

All conditionals are greater than 0.

24
Q

What are the steps for finding the conditional distribution?

A
  1. Write down the log-joint distribution p(x1, x2…, xn).
    For each x_i:
  2. Remove terms not dependent on x_i
  3. Use the closest “known” distrubution as yours.
25
Q

Why isn’t Gibbs sampling suited for highly correlated samples?

A

Gibbs sampling can only move along the “axis” direction, this is bad for correlated distrubutions which are spread out along diagonals.

26
Q

What is block and collapsed Gibbs sampling?

A

Collapsed:
Analytically integrate out parameters
Block:
Sample groups of variables at a time.

27
Q

What are the basic steps behind slice sampling?

A

Draw u~U[0, p_hat(x_t)] ( From 0 to the maximum of the unnormalized sitribution)
Draw x_t~U[x: p_hat(x) > u] (Horizontal slice)