L.9 - Bayesian ANOVA chapters Flashcards

1
Q

Predictive quality

A
  • how well did model, or parameter value, predict the observed data?
  • we use this predictive quality to update our knowledge about the world
    > we use updated knowledge to make predictions about tomorrow’s world
  • see Bayesian learning circle in image 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the values of parameters across different models?

A
  • θ → binomial test
  • δ → t-test
  • ρ → correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cauchy distribution
- when do we use it?
- why?

A
  • difference in means is on a continuous scale without hard bounds
    > as opposed to correlation or proportion
  • we use this distribution to characterize each model’s prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how can we write a t-test model as a linear regression?

A

y1 = b0 + b1*xi
- do the beer tastings differ meaningfully if the beer is alcoholic or not?
- is b1=0?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how can we write down the hypothesis of the Cauchy distribution?

A

H0: b1 = 0
H1: b1 ~ Cauchy(0.707)
- 0.707 is the set scale of Cauchy’s distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the 0.707 in the Cauchy distribution?

A
  • the alternative model bets around 50% on values between -0.707 and 0.707
  • this number is conventional, we don’t really change it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how can we interpret the hypothesis of the Cauchy’s distribution?

A
  • H0 goes all-in on 0 being the true value
  • H1 spreads its bets across a range of values of b1
    > how likely are the data under each hypothesis?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

priors in ANOVA

A
  • in expanding ANOVA (more than one effect or more than 2 groups), we add “b” for each parameter of the model
  • each parameter will need a prior distribution, to make concrete what the model is predicting
    > average quality of a model’s prediction is its marginal likelihood
  • see images 2 & 3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how can we use JASP for the example above?

A
  • repeated measures ANOVA
  • between (correctedness) and within (alcohol & non-alcohol)
  • see image 4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

P(M)

A
  • P(M): prior model probability
    > how likely is each model, before seeing the data?
    > usually we divide the odds between models (here 0.25 each)
    → BeerType model: before looking at the data, there is a 25% probability that this model is the true model, out of these four models
  • see image 5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

P(M|data)

A
  • P(M|data): posterior model probability
    > how likely is each model, after seeing the data?
    > all values sum to 1
    → BeerType model: after looking at the data, there is a 44.6% probability that this model is the true model, out of all these models
  • see image 5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

BFM

A
  • posterior model odds
    > updating factor from prior to posterior model probability
    → for BeerType model, we calculate it through posterior and prior odds
    → the data are 2.42 times more likely under this model, than under all the other models combined
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

BF10

A
  • pairwise Bayes factor
    > how likely are the data under this model, compared to another model?
    > to compute BF10, we take ratio of posterior model probability of specific model divided by posterior model probability of best model
    → BeerType model: the data are 0.81 times more likely under this model, than under the best model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

BFM vs BF10

A
  • BFM: comparison between single model and all other models combines
  • BF10: pairwise comparisons between single models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bayes factor transitivity

A
  • we can use BF10 column to conduct additional model comparisons
  • e.g. we can compare BeerType model directly to Null model
    → we take those two models’ BF10 values and divide them by each other (beertype BF10 / null BF10)
    → the data are 97296 times more likely under beertype model than under null model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

analysis of effects

A
  • we can compare groups of models, instead of comparing single models with each other
  • we can look at each predictor variable, and how well the models including that predictor, predicted the data
    → e.g. (beertype & beertype+correctness) vs (null & correctness)
  • see image 6
17
Q

P(incl)

A
  • prior inclusion probability
    > we sum all the prior model probabilities of the models that included the predictor
    → correctedness is included in two models that each have a prior model probability fo 0.25, so its prior inclusion probability is 0.5
18
Q

P(incl|data)

A
  • posterior inclusion probability
    > we sum all the posterior model probabilities of the models that include this predictor
    → correctedness is included in 2 models that have a posterior model probability of 0.563 and something very small, so its posterior inclusion probability is approximately 0.563
19
Q

BFincl

A
  • inclusion Bayes factor
    > quantifies the change from prior inclusion probability to posterior inclusion probability for each component
    → for correctedness, it is aorund 1.24
    ! this is most important column
    ! it quantifies how well the models with a certain predictor do, compared to models without
    → much evidence for models with beertype, less for models with correctedness
    → this is absence of evidence (for correctedness)
20
Q

single model inference

A
  • we can specify specific model and geet estimates of various parameters (its b’s)
  • see image 7
21
Q

what does the table of Single Model Inference show?

A
  • mean estimates (and credible intervals) for the group differences
  • it characterizes what this specific model predicts for certain participants in certain situations
  • we sum the values of intercept, and the values of the two conditions we are interested in
22
Q

Person tasting a non-alcoholic beer, who correctly identified it

A

48.4 + (-9.64) + (-4.15) = 34.55

23
Q

person tasting non-alcoholic beer, incorrectly identifies

A

48.4 + (-9.64) + (4.15) = 42.85

24
Q

Model Averaging

A
  • we can combine all the models into a single prediction by model averaging
  • this method weighs each model’s prediction by their posterior model probability
    ! these estimates do not differ much from the previous estimates because they are mostly dictated by models with two main effects (best models)
  • see image 8
25
Q

Error Percentage

A
  • running the same bayesian ANOVA twice will lead to slight fluctuation in BFs
    → the fluctuation is indicated by “error %” column in JASP
  • these percentages indicate how much the BF will fluctuate from run to run
  • usually error below 20 is acceptable (BFs will deviate between 8 and 12)
26
Q

how does the Bayesian paradigm differ from the frequentist paradigm?
(part 1)

A
  • evidence in favour of a particular model is a continuous measure of support
    > no need for all-or-none bayesian factor cut-off points to accept/reject particular model
  • Bayes factor can discriminate between absence of evidence (data predicted equally under both models) and evidence of absence (data support the null hypothesis)
27
Q

how does the Bayesian paradigm differ from the frequentist paradigm?
(part 2)

A
  • in Bayesian paradigm, the knowledge about models M and parameters Beta is updated simultaneously
    > we account for model uncertainty by considering all models, but weighting more the ones that predict data well
    > known as “bayesian model averaging”
  • frequentists first select a ‘best’ model and then estimate its parameters
    → neglect model uncertainty and produce overconfident conclusions
28
Q

how does the Bayesian paradigm differ from the frequentist paradigm?
(part 3)

A
  • Bayesian posterior distributions allow for direct probabilistic statements about parameters
  • range of parameters is called credible interval (usually 95%)
    > we can consider any interval from a to b and quantify our confidence that the parameter falls in that specific range
29
Q

how does the Bayesian paradigm differ from the frequentist paradigm?
(part 4)

A
  • Bayesian inference automatically penalizes for complexity and favours parsimony
    > e.g. a model with a redundant covariate will make poor predictions
    → BF will favour model without the redundant predictor
30
Q

Important summary!

A

the predictive performance is assessed using parameter values that are drawn from the prior distributions

30
Q

mega summary of frequentist vs bayesian

A
  • In frequentist ANOVA, the comparison of variances between and within groups is key, and the result is evaluated using a p-value based on the F-distribution. If the null hypothesis is true, variances between and within groups should be similar.
  • In Bayesian ANOVA, instead of comparing variances, model predictions are evaluated based on prior distributions
31
Q

Summary of Bayesian Model Averaging

A
  • Bayesian Model Averaging (BMA) is a statistical approach that deals with uncertainty in model selection
  • Instead of picking just one “best” model, BMA looks at all possible models and combines their results.
  • It does this by weighing each model’s estimates according to how likely that model is (its “posterior probability”).
  • This way, BMA accounts for uncertainty and gives more reliable results compared to traditional methods, which only focus on a single model and ignore the possibility that another model might also be valid.