Questions #3 Flashcards by Jean-Philippe Chagnon

What is the main shortcoming of grid approximation?

It scales very poorly in high dimensions.

How well did you know this?

Not at all

Perfectly

What is the main shortcoming of quadratic approximation?

Scales better than grid, but it can struggle in the face of complex, hierarchical models.

It also does not fare well in the presence of posterior distributions that cannot be well approximated by a Gaussian distribution

How well did you know this?

Not at all

Perfectly

What is the 3 things we must be able to do to perform MCMC sampling with the metropolis algo

We must be able to generate a random value from the proposal distribution (Normal (theta current, sigma^2)
We must be able to calculate the unnormalized posterior densities
We must be able to generate a uniform random value from 0 to 1 to accept or reject the proposed parameter value

How well did you know this?

Not at all

Perfectly

In the trace plot, if the standard variation of the proposal distribution is too low, what will happen?

It will take longer to get to the right values

How well did you know this?

Not at all

Perfectly

In the trace plot, if the standard deviation of the proposal distribution is too high, what will happen?

It will generate far-away proposals that will usually get rejected and won’t explore the posterior distribution well. (graphique en robot)

How well did you know this?

Not at all

Perfectly

What is the difference between Metropolis and Metropolis-Hasting?

Metropolis-Hasting generalizes the metropolis by allowing assymetric proposals

How well did you know this?

Not at all

Perfectly

True or false : We need a symmetric proposal function in the metropolis algo

True

How well did you know this?

Not at all

Perfectly

True or false : In the Gibbs Algorithm, we always accept the proposal

True

How well did you know this?

Not at all

Perfectly

What are the advantages of Gibbs ?

Efficiency in sampling from the posterior and no tuning of the proposal distribution

How well did you know this?

Not at all

Perfectly

What are the disadvantages of Gibbs?

Ability to compute and sample from conditional posterior distributions
Sampling efficiency in models with correlated parameters

How well did you know this?

Not at all

Perfectly

True or false : If the parameters are graphed in n-dimensional space, the metropolis algorithm movements can be in any direction,. The gibbs movements are always parallel to the axes

True

How well did you know this?

Not at all

Perfectly

Write the formulas useful for HMC

Formula for momentum
Formula for Theta
Formula for prob of accept

How well did you know this?

Not at all

Perfectly

What is the consequence in HMC if s is too low?

The proposal distribution is too concentrated and does not have sufficient time to move into the region of high posterior mass

How well did you know this?

Not at all

Perfectly

What is the conserquence in HMC if s is too high?

Lower proposal rates since the proposals are too far away from the mode of the distribution
Can result in U-turn problem

How well did you know this?

Not at all

Perfectly

In MCMC, a small number of samples is needed if we want the posterior mean

True

How well did you know this?

Not at all

Perfectly

In MCMC, a small number of samples is needed if we want the posterior variance and the percentiles extreme

Study These Flashcards

False. We need a lot of samples

In a trace plot, if the chains are all representative of the posterior, they should … (2)

Study These Flashcards

They should overlap each other and be unrelated to their randomly set starting positions
They should also be stationary around the same modal value

Can you give me a reason why the trace plot are not mixing well?

Study These Flashcards

The prior is too flat

What we need to do if a chain in the trace plot is isolated from the others?

Study These Flashcards

Try to run more samples

2. Need to check our model definition, implementation method, or input data for potential issues

How can we know how correlated the proposed parameters are through the sampling iterations in MCMC?

Study These Flashcards

With an autocorrelation plot

True or false : A correlogram don’t necessarily give an indication if the chain is representative of the posterior distribution, but it will give us a sens of the efficiency of our MCMC algorithm

Study These Flashcards

True

True or false : HMC as low-autocorrelation if the parameters are well-tuned

Study These Flashcards

true

true or false : A greater number of iterations will be needed to explore the posterior distribution if there is a lot of autocorrelation

Study These Flashcards

true

What are the 2 forms of binomial regression

Study These Flashcards

Logistic regression. Each record in the data set indicated whether an event occurred or didn’t. We are estimating probability of an event
Aggregated binomial regression. Each record in the data set states the size of the population and the number of events that occurred

True or false : Mixtures are a way to handle overdispersion

True

True or false : WAIC and PSIS can be used for beta-binomial and poisson-gamma models

False.

True or false : Cross-validation is the only tool that we can use for Beta Binomial and Poisson gamma models

True

What are the advantages of multilevel models?

1. Better estimates ahen there are repeated observations from the same cluster 2. Better estimates for unbalanced data 3. We get estimates of variation between clusters 4. We avoid averaging of data.

What are the advantages of multilevel regression

1. We need assumptions for how clusters vary 2. The posteriors are nowhere near normally distributed. MCMC, which is slow, is required 3. Models are harder to understand

True or false : In a multilevel model, the parameter in the linear expression for the varying effects is indexed

True

Which approximation method need to be used for multilevel models?

MCMC

Multilevel models produce shrinkage estimates. The properties of these estimates are

1. Estimates are between the no pooling and complete pooling estimates; they shrink from the individual estimate towards the overall estimate 2. Shrinkage is greater when there is less data for a group 3. Shrinkage is greater when the individual estimate is further from the overall estimate

True or false : The lower the variance of the varying intercept, the lower the effective number of parameters

True

True or false : Multilevel models control overfitting by limiting variation of less important varying effects

True

When post-stratification can be used?

When the sample are biaised because the presence of the data may be a function of the cluster

When it is useful to do a non-centered parametrization?

When there is a divergent transitions. When the acceptance rate is not achieved (HMC)

What are the advantages of non-centered parametrization?

1. Better for clusters with low variation, or clusters with a lot of units but not much data in each unit 2. It improves efficiency, and the resulting model is equivalent to the original model

What are the disadvantages of non-centered parametrzation?

1. Model is harder to understand 2. The fitted varying effect parameters are standardized 3. Sometimes non-centered parametrization doesn't help

Questions #3 Flashcards

(38 cards)