- how well did model, or parameter value, predict the observed data? - we use this predictive quality to update our knowledge about the world > we use updated knowledge to make predictions about tomorrow's world - see Bayesian learning circle in image 1

- in expanding ANOVA (more than one effect or more than 2 groups), we add "b" for each parameter of the model - each parameter will need a prior distribution, to make concrete what the model is predicting > average quality of a model's prediction is its marginal likelihood - see images 2 & 3

- P(M): prior model probability > how likely is each model, before seeing the data? > usually we divide the odds between models (here 0.25 each) → BeerType model: before looking at the data, there is a 25% probability that this model is the true model, out of these four models - see image 5

- P(Mdata): posterior model probability > how likely is each model, after seeing the data? > all values sum to 1 → BeerType model: after looking at the data, there is a 44.6% probability that this model is the true model, out of all these models - see image 5

- posterior model odds > updating factor from prior to posterior model probability → for BeerType model, we calculate it through posterior and prior odds → the data are 2.42 times more likely under this model, than under all the other models combined

- pairwise Bayes factor > how likely are the data under this model, compared to another model? > to compute BF10, we take ratio of posterior model probability of specific model divided by posterior model probability of best model → BeerType model: the data are 0.81 times more likely under this model, than under the best model

- BFM: comparison between single model and all other models combines - BF10: pairwise comparisons between single models

L.9 - Bayesian ANOVA chapters Flashcards by Eli Cusano

Predictive quality

how well did model, or parameter value, predict the observed data?
we use this predictive quality to update our knowledge about the world
> we use updated knowledge to make predictions about tomorrow’s world
see Bayesian learning circle in image 1

How well did you know this?

Not at all

Perfectly

what are the values of parameters across different models?

θ → binomial test
δ → t-test
ρ → correlation

How well did you know this?

Not at all

Perfectly

Cauchy distribution
- when do we use it?
- why?

difference in means is on a continuous scale without hard bounds
> as opposed to correlation or proportion
we use this distribution to characterize each model’s prediction

How well did you know this?

Not at all

Perfectly

how can we write a t-test model as a linear regression?

y1 = b0 + b1*xi
- do the beer tastings differ meaningfully if the beer is alcoholic or not?
- is b1=0?

How well did you know this?

Not at all

Perfectly

how can we write down the hypothesis of the Cauchy distribution?

H0: b1 = 0
H1: b1 ~ Cauchy(0.707)
- 0.707 is the set scale of Cauchy’s distribution

How well did you know this?

Not at all

Perfectly

what is the 0.707 in the Cauchy distribution?

the alternative model bets around 50% on values between -0.707 and 0.707
this number is conventional, we don’t really change it

How well did you know this?

Not at all

Perfectly

how can we interpret the hypothesis of the Cauchy’s distribution?

H0 goes all-in on 0 being the true value
H1 spreads its bets across a range of values of b1
> how likely are the data under each hypothesis?

How well did you know this?

Not at all

Perfectly

priors in ANOVA

in expanding ANOVA (more than one effect or more than 2 groups), we add “b” for each parameter of the model
each parameter will need a prior distribution, to make concrete what the model is predicting
> average quality of a model’s prediction is its marginal likelihood
see images 2 & 3

How well did you know this?

Not at all

Perfectly

how can we use JASP for the example above?

repeated measures ANOVA
between (correctedness) and within (alcohol & non-alcohol)
see image 4

How well did you know this?

Not at all

Perfectly

P(M)

P(M): prior model probability
> how likely is each model, before seeing the data?
> usually we divide the odds between models (here 0.25 each)
→ BeerType model: before looking at the data, there is a 25% probability that this model is the true model, out of these four models
see image 5

How well did you know this?

Not at all

Perfectly

P(M|data)

P(M|data): posterior model probability
> how likely is each model, after seeing the data?
> all values sum to 1
→ BeerType model: after looking at the data, there is a 44.6% probability that this model is the true model, out of all these models
see image 5

How well did you know this?

Not at all

Perfectly

BFM

posterior model odds
> updating factor from prior to posterior model probability
→ for BeerType model, we calculate it through posterior and prior odds
→ the data are 2.42 times more likely under this model, than under all the other models combined

How well did you know this?

Not at all

Perfectly

BF10

pairwise Bayes factor
> how likely are the data under this model, compared to another model?
> to compute BF10, we take ratio of posterior model probability of specific model divided by posterior model probability of best model
→ BeerType model: the data are 0.81 times more likely under this model, than under the best model

How well did you know this?

Not at all

Perfectly

BFM vs BF10

BFM: comparison between single model and all other models combines
BF10: pairwise comparisons between single models

How well did you know this?

Not at all

Perfectly

Bayes factor transitivity

we can use BF10 column to conduct additional model comparisons
e.g. we can compare BeerType model directly to Null model
→ we take those two models’ BF10 values and divide them by each other (beertype BF10 / null BF10)
→ the data are 97296 times more likely under beertype model than under null model

How well did you know this?

Not at all

Perfectly

analysis of effects

Study These Flashcards

we can compare groups of models, instead of comparing single models with each other
we can look at each predictor variable, and how well the models including that predictor, predicted the data
→ e.g. (beertype & beertype+correctness) vs (null & correctness)
see image 6

P(incl)

Study These Flashcards

prior inclusion probability
> we sum all the prior model probabilities of the models that included the predictor
→ correctedness is included in two models that each have a prior model probability fo 0.25, so its prior inclusion probability is 0.5

P(incl|data)

Study These Flashcards

posterior inclusion probability
> we sum all the posterior model probabilities of the models that include this predictor
→ correctedness is included in 2 models that have a posterior model probability of 0.563 and something very small, so its posterior inclusion probability is approximately 0.563

BFincl

Study These Flashcards

inclusion Bayes factor
> quantifies the change from prior inclusion probability to posterior inclusion probability for each component
→ for correctedness, it is aorund 1.24
! this is most important column
! it quantifies how well the models with a certain predictor do, compared to models without
→ much evidence for models with beertype, less for models with correctedness
→ this is absence of evidence (for correctedness)

single model inference

Study These Flashcards

we can specify specific model and geet estimates of various parameters (its b’s)
see image 7

what does the table of Single Model Inference show?

Study These Flashcards

mean estimates (and credible intervals) for the group differences
it characterizes what this specific model predicts for certain participants in certain situations
we sum the values of intercept, and the values of the two conditions we are interested in

Person tasting a non-alcoholic beer, who correctly identified it

Study These Flashcards

48.4 + (-9.64) + (-4.15) = 34.55

person tasting non-alcoholic beer, incorrectly identifies

Study These Flashcards

48.4 + (-9.64) + (4.15) = 42.85

Model Averaging

Study These Flashcards

we can combine all the models into a single prediction by model averaging
this method weighs each model’s prediction by their posterior model probability
! these estimates do not differ much from the previous estimates because they are mostly dictated by models with two main effects (best models)
see image 8

Error Percentage

- running the same bayesian ANOVA twice will lead to slight fluctuation in BFs → the fluctuation is indicated by "error %" column in JASP - these percentages indicate how much the BF will fluctuate from run to run - usually error below 20 is acceptable (BFs will deviate between 8 and 12)

how does the Bayesian paradigm differ from the frequentist paradigm? (part 1)

- evidence in favour of a particular model is a continuous measure of support > no need for all-or-none bayesian factor cut-off points to accept/reject particular model - Bayes factor can discriminate between absence of evidence (data predicted equally under both models) and evidence of absence (data support the null hypothesis)

how does the Bayesian paradigm differ from the frequentist paradigm? (part 2)

- in Bayesian paradigm, the knowledge about models M and parameters Beta is updated simultaneously > we account for model uncertainty by considering all models, but weighting more the ones that predict data well > known as "bayesian model averaging" - frequentists first select a ‘best’ model and then estimate its parameters → neglect model uncertainty and produce overconfident conclusions

how does the Bayesian paradigm differ from the frequentist paradigm? (part 3)

- Bayesian posterior distributions allow for direct probabilistic statements about parameters - range of parameters is called credible interval (usually 95%) > we can consider any interval from a to b and quantify our confidence that the parameter falls in that specific range

how does the Bayesian paradigm differ from the frequentist paradigm? (part 4)

- Bayesian inference automatically penalizes for complexity and favours parsimony > e.g. a model with a redundant covariate will make poor predictions → BF will favour model without the redundant predictor

Important summary!

the predictive performance is assessed using parameter values that are drawn from the prior distributions

mega summary of frequentist vs bayesian

- In frequentist ANOVA, the comparison of variances between and within groups is key, and the result is evaluated using a p-value based on the F-distribution. If the null hypothesis is true, variances between and within groups should be similar. - In Bayesian ANOVA, instead of comparing variances, model predictions are evaluated based on prior distributions

Summary of Bayesian Model Averaging

- Bayesian Model Averaging (BMA) is a statistical approach that deals with uncertainty in model selection - Instead of picking just one "best" model, BMA looks at all possible models and combines their results. - It does this by weighing each model's estimates according to how likely that model is (its "posterior probability"). - This way, BMA accounts for uncertainty and gives more reliable results compared to traditional methods, which only focus on a single model and ignore the possibility that another model might also be valid.

L.9 - Bayesian ANOVA chapters Flashcards

(32 cards)