Chapter 4 Flashcards
s^2=?
sum from i=1 to n{(xi-xbar)^2/n}
Why is inference not straight-forward in non-conjugate problems? Why are non-conjugate priors then used?(2,2)
not using a conjugate prior distribution can cause many basic
problems such as plotting the posterior density or determining posterior moments. But
having to use conjugate priors is far too restrictive for many real data analyses: (i) our
prior beliefs may not be captured using a conjugate prior; (ii) most models for complex
data do not have conjugate priors.
Lags and correlations in MCMC.(2)
We know that the simulator rnorm() produces independent realisations and so the
(sample) correlation between say consecutive values corr(xi,xi+1) will be almost zero.
This is also the case for correlations at all positive lags. Finally the lag 0 autocorrelation
corr(xi,xi) must be one (by definition).
What is key to MCMC. What is this?(2)
This alternate sampling from conditional distributions defines a bivariate Markov chain, and the above is an intuitive explanation for why f (x,y) is its stationary distribution. Thus being able to simulate easily from conditional distributions is key to this methodology.
Further: We have already seen that dealing with conditional posterior distributions is straightfor-
ward when the prior is semi-conjugate, so let’s assume that simulating from f (y|x) and f (x|y) is straightforward. The key problem with using either of the above methods is that, in general, we can’t simulate from the marginal distribution, f (x) and f (y).
For the moment, suppose we can simulate from the marginal distribution for X, that is, we have an X = x from f (x). We can now simulate a Y = y from f (y|x) to give a pair
(x,y) from the bivariate density. Given that this pair is from the bivariate density, the y value must be from the marginal f (y), and so we can simulate an X = x′ from f (x|y) to give a new pair (x′,y) also from the joint density. But now x′ is from the marginal f (x), and so we can simulate a Y = y′ from f (y|X = x′) to give a new pair (x′,y′) also from
the joint density. And we can keep going.
Outline the Gibbs sampler.(4)
Suppose we want to generate realisations from the posterior density π(θ|x), where θ =(θ1,θ2,…,θp)^T, and that we can simulate from the full conditional distributions (FCDs) π(θi|θ1,…,θi−1,θi+1,…,θp,x) = π(θi|·), i = 1,2,…,p.
The Gibbs sampler follows the following algorithm:
1. Initialise the iteration counter to j = 1.
Initialise the state of the chain to θ(0) = (θ(0)
1 ,…,θ(0)p )^T.
2. Obtain a new value θ(j) from θ(j−1) by successive generation of values
θ(j)1 ∼π(θ1|θ2(j−1),θ(j−1)3 ,…,θ(j−1)p ,x)
θ(j)2 ∼π(θ2|θ(j)1 ,θ(j−1)3 ,…,θ(j−1)p ,x)
… … …
θ(j)p ∼π(θp|θ(j)1 ,θ(j)2 ,…,θ(j)p−1,x)
3. Change counter j to j + 1, and return to step 2.
What is the burn-in period and how would you determine one?(2)
- How long it takes before simulations appear to be from the same distribution
- The most effective method is simply to look at a trace plot of the posterior sample and detect the point after which the realisations look to be from the same distribution.
Two major issues that arise with Gibbs sample.(2)
Convergence thus need to determine burn-in
Autocorrelation hence may require thinning
What are two major issues that arise with Gibbs sample output?Provide a strategy to ensure sampling really is from the stationary distribution(2,3)
Convergence thus need to determine burn-in
Autocorrelation hence may require thinning.
Strategy
1. Determine the burn-in period, where Gibbs sampler has reached its stationary distribution. This may involve thinning the posterior sample as slowly snaking trace plots may be due to high autocorrelations rather than a lack of convergence.
2. After this, determine the level of thinning needed to obtain a posterior sample whose autocorrelations are roughly zero.
3. Repeat steps 1 and 2 several times using different initial values to make sure that the sample really is from the stationary distribution of the chain, that is, from the
posterior distribution.
What is thinning and how would you determine this? What happens if you thin too much?(3)
Thinning here means not taking every realisation by say taking say every mth realisation.
In general, an appropriate level of thinning is determined by the largest lag m at which any of the variables have a non-negligible autocorrelation.
If doing this leaves a (thinned) posterior sample which is too small then the original Gibbs sampler should be re-run (after convergence) for a sufficiently large number of iterations until the thinned sample is of the required size.
Accuracy of mu-bar in un-autocorrelated MCMC output for moving average process?Wb for autoregressive process?(3)
This means that the moving average(MA) model does not uses the past forecasts to predict the future values whereas it uses the errors from the past forecasts. While, the autoregressive model(AR) uses the past forecasts to predict future values.
Note r(k) = Corr(μ(j),μ(j+k))
If the MCMC output is un-autocorrelated then the accuracy of ̄μ is roughly ±2sμ/√N.
the accuracy of ̄μ is roughly ±2sμ/√N{1 −r(1)}2-due to autocorrelation. the amount of information in the data is equivalent to a random sample
with size Nef f = N{1 −r(1)}^2 more complicated for high order autocorrelations hence use of thinning
It’s worth noting that, in general, MCMC
output with positive autocorrelations has Nef f < N. Also sometimes MCMC output with some negative autocorrelations can have Nef f > N.
The asymptotic posterior distribution about the mean and precision
(µ, τ)
T using a random sample from a normal N(µ, 1/τ) distribution is…
µ|x ∼ N(¯x, s^2/n), τ|x ∼ N{1/s^2, 2/(ns^4)}, independently
Posterior mean and sd distributions for MCMC sample.(2)
M∼N(¯µ, s^2µ/N), Σ^−2∼ N{1/s^2µ, 2/(Ns^4µ)}, independently.
Approx 95% HDI for M. What about sigma?(2)
µ¯ ± (z0.025sµ)/√N~ µ¯ ±(2sµ)/√N
sµ ± sµ*sqrt(2/N).
Fairly accurate even for non-normal looking distributions, using large enough N for MCMC gives asymptotic properties hence why these are the results.
What is semi-conjugacy?(1)
Notice that, since µ and τ independent a priori, µ|τ ∼ N(b, 1/c). Therefore, given τ, the normal prior for µ is conjugate. Similarly, τ|µ ∼ Ga(g, h) and so, given µ, the
gamma prior for τ is conjugate. Therefore, both conditional priors (for µ|τ and τ|µ) are
conjugate. Such priors are called semi-conjugate.
What can you use a converged and thinned MCMC sample to do?(4)
- Obtain the posterior distribution for any (joint) functions of the parameters, such as σ = 1/√τ or (θ1=µ−τ, θ2=e^(µ+τ/2)^T
- Look at bivariate posterior distributions via scatter plots
- Look at univariate marginal posterior distributions via histograms or boxplots
- Obtain numerical summaries such as the mean, standard deviation and confidence intervals for single variables and correlations between variables.