Questions #3 Flashcards

1
Q

What is the main shortcoming of grid approximation?

A

It scales very poorly in high dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the main shortcoming of quadratic approximation?

A

Scales better than grid, but it can struggle in the face of complex, hierarchical models.

It also does not fare well in the presence of posterior distributions that cannot be well approximated by a Gaussian distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the 3 things we must be able to do to perform MCMC sampling with the metropolis algo

A
  1. We must be able to generate a random value from the proposal distribution (Normal (theta current, sigma^2)
  2. We must be able to calculate the unnormalized posterior densities
  3. We must be able to generate a uniform random value from 0 to 1 to accept or reject the proposed parameter value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In the trace plot, if the standard variation of the proposal distribution is too low, what will happen?

A

It will take longer to get to the right values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the trace plot, if the standard deviation of the proposal distribution is too high, what will happen?

A

It will generate far-away proposals that will usually get rejected and won’t explore the posterior distribution well. (graphique en robot)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between Metropolis and Metropolis-Hasting?

A

Metropolis-Hasting generalizes the metropolis by allowing assymetric proposals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or false : We need a symmetric proposal function in the metropolis algo

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or false : In the Gibbs Algorithm, we always accept the proposal

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the advantages of Gibbs ?

A

Efficiency in sampling from the posterior and no tuning of the proposal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the disadvantages of Gibbs?

A
  1. Ability to compute and sample from conditional posterior distributions
  2. Sampling efficiency in models with correlated parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

True or false : If the parameters are graphed in n-dimensional space, the metropolis algorithm movements can be in any direction,. The gibbs movements are always parallel to the axes

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Write the formulas useful for HMC

A
  1. Formula for momentum
  2. Formula for Theta
  3. Formula for prob of accept
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the consequence in HMC if s is too low?

A

The proposal distribution is too concentrated and does not have sufficient time to move into the region of high posterior mass

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the conserquence in HMC if s is too high?

A

Lower proposal rates since the proposals are too far away from the mode of the distribution
Can result in U-turn problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In MCMC, a small number of samples is needed if we want the posterior mean

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In MCMC, a small number of samples is needed if we want the posterior variance and the percentiles extreme

A

False. We need a lot of samples

17
Q

In a trace plot, if the chains are all representative of the posterior, they should … (2)

A
  1. They should overlap each other and be unrelated to their randomly set starting positions
  2. They should also be stationary around the same modal value
18
Q

Can you give me a reason why the trace plot are not mixing well?

A

The prior is too flat

19
Q

What we need to do if a chain in the trace plot is isolated from the others?

A
  1. Try to run more samples

2. Need to check our model definition, implementation method, or input data for potential issues

20
Q

How can we know how correlated the proposed parameters are through the sampling iterations in MCMC?

A

With an autocorrelation plot

21
Q

True or false : A correlogram don’t necessarily give an indication if the chain is representative of the posterior distribution, but it will give us a sens of the efficiency of our MCMC algorithm

A

True

22
Q

True or false : HMC as low-autocorrelation if the parameters are well-tuned

A

true

23
Q

true or false : A greater number of iterations will be needed to explore the posterior distribution if there is a lot of autocorrelation

A

true

24
Q

What are the 2 forms of binomial regression

A
  1. Logistic regression. Each record in the data set indicated whether an event occurred or didn’t. We are estimating probability of an event
  2. Aggregated binomial regression. Each record in the data set states the size of the population and the number of events that occurred
25
Q

True or false : Mixtures are a way to handle overdispersion

A

True

26
Q

True or false : WAIC and PSIS can be used for beta-binomial and poisson-gamma models

A

False.

27
Q

True or false : Cross-validation is the only tool that we can use for Beta Binomial and Poisson gamma models

A

True

28
Q

What are the advantages of multilevel models?

A
  1. Better estimates ahen there are repeated observations from the same cluster
  2. Better estimates for unbalanced data
  3. We get estimates of variation between clusters
  4. We avoid averaging of data.
29
Q

What are the advantages of multilevel regression

A
  1. We need assumptions for how clusters vary
  2. The posteriors are nowhere near normally distributed. MCMC, which is slow, is required
  3. Models are harder to understand
30
Q

True or false : In a multilevel model, the parameter in the linear expression for the varying effects is indexed

A

True

31
Q

Which approximation method need to be used for multilevel models?

A

MCMC

32
Q

Multilevel models produce shrinkage estimates. The properties of these estimates are

A
  1. Estimates are between the no pooling and complete pooling estimates; they shrink from the individual estimate towards the overall estimate
  2. Shrinkage is greater when there is less data for a group
  3. Shrinkage is greater when the individual estimate is further from the overall estimate
33
Q

True or false : The lower the variance of the varying intercept, the lower the effective number of parameters

A

True

34
Q

True or false : Multilevel models control overfitting by limiting variation of less important varying effects

A

True

35
Q

When post-stratification can be used?

A

When the sample are biaised because the presence of the data may be a function of the cluster

36
Q

When it is useful to do a non-centered parametrization?

A

When there is a divergent transitions. When the acceptance rate is not achieved (HMC)

37
Q

What are the advantages of non-centered parametrization?

A
  1. Better for clusters with low variation, or clusters with a lot of units but not much data in each unit
  2. It improves efficiency, and the resulting model is equivalent to the original model
38
Q

What are the disadvantages of non-centered parametrzation?

A
  1. Model is harder to understand
  2. The fitted varying effect parameters are standardized
  3. Sometimes non-centered parametrization doesn’t help