L.9 - Bayesian ANOVA chapters Flashcards
Predictive quality
- how well did model, or parameter value, predict the observed data?
- we use this predictive quality to update our knowledge about the world
> we use updated knowledge to make predictions about tomorrow’s world - see Bayesian learning circle in image 1
what are the values of parameters across different models?
- θ → binomial test
- δ → t-test
- ρ → correlation
Cauchy distribution
- when do we use it?
- why?
- difference in means is on a continuous scale without hard bounds
> as opposed to correlation or proportion - we use this distribution to characterize each model’s prediction
how can we write a t-test model as a linear regression?
y1 = b0 + b1*xi
- do the beer tastings differ meaningfully if the beer is alcoholic or not?
- is b1=0?
how can we write down the hypothesis of the Cauchy distribution?
H0: b1 = 0
H1: b1 ~ Cauchy(0.707)
- 0.707 is the set scale of Cauchy’s distribution
what is the 0.707 in the Cauchy distribution?
- the alternative model bets around 50% on values between -0.707 and 0.707
- this number is conventional, we don’t really change it
how can we interpret the hypothesis of the Cauchy’s distribution?
- H0 goes all-in on 0 being the true value
- H1 spreads its bets across a range of values of b1
> how likely are the data under each hypothesis?
priors in ANOVA
- in expanding ANOVA (more than one effect or more than 2 groups), we add “b” for each parameter of the model
- each parameter will need a prior distribution, to make concrete what the model is predicting
> average quality of a model’s prediction is its marginal likelihood - see images 2 & 3
how can we use JASP for the example above?
- repeated measures ANOVA
- between (correctedness) and within (alcohol & non-alcohol)
- see image 4
P(M)
- P(M): prior model probability
> how likely is each model, before seeing the data?
> usually we divide the odds between models (here 0.25 each)
→ BeerType model: before looking at the data, there is a 25% probability that this model is the true model, out of these four models - see image 5
P(M|data)
- P(M|data): posterior model probability
> how likely is each model, after seeing the data?
> all values sum to 1
→ BeerType model: after looking at the data, there is a 44.6% probability that this model is the true model, out of all these models - see image 5
BFM
- posterior model odds
> updating factor from prior to posterior model probability
→ for BeerType model, we calculate it through posterior and prior odds
→ the data are 2.42 times more likely under this model, than under all the other models combined
BF10
- pairwise Bayes factor
> how likely are the data under this model, compared to another model?
> to compute BF10, we take ratio of posterior model probability of specific model divided by posterior model probability of best model
→ BeerType model: the data are 0.81 times more likely under this model, than under the best model
BFM vs BF10
- BFM: comparison between single model and all other models combines
- BF10: pairwise comparisons between single models
Bayes factor transitivity
- we can use BF10 column to conduct additional model comparisons
- e.g. we can compare BeerType model directly to Null model
→ we take those two models’ BF10 values and divide them by each other (beertype BF10 / null BF10)
→ the data are 97296 times more likely under beertype model than under null model
analysis of effects
- we can compare groups of models, instead of comparing single models with each other
- we can look at each predictor variable, and how well the models including that predictor, predicted the data
→ e.g. (beertype & beertype+correctness) vs (null & correctness) - see image 6
P(incl)
- prior inclusion probability
> we sum all the prior model probabilities of the models that included the predictor
→ correctedness is included in two models that each have a prior model probability fo 0.25, so its prior inclusion probability is 0.5
P(incl|data)
- posterior inclusion probability
> we sum all the posterior model probabilities of the models that include this predictor
→ correctedness is included in 2 models that have a posterior model probability of 0.563 and something very small, so its posterior inclusion probability is approximately 0.563
BFincl
- inclusion Bayes factor
> quantifies the change from prior inclusion probability to posterior inclusion probability for each component
→ for correctedness, it is aorund 1.24
! this is most important column
! it quantifies how well the models with a certain predictor do, compared to models without
→ much evidence for models with beertype, less for models with correctedness
→ this is absence of evidence (for correctedness)
single model inference
- we can specify specific model and geet estimates of various parameters (its b’s)
- see image 7
what does the table of Single Model Inference show?
- mean estimates (and credible intervals) for the group differences
- it characterizes what this specific model predicts for certain participants in certain situations
- we sum the values of intercept, and the values of the two conditions we are interested in
Person tasting a non-alcoholic beer, who correctly identified it
48.4 + (-9.64) + (-4.15) = 34.55
person tasting non-alcoholic beer, incorrectly identifies
48.4 + (-9.64) + (4.15) = 42.85
Model Averaging
- we can combine all the models into a single prediction by model averaging
- this method weighs each model’s prediction by their posterior model probability
! these estimates do not differ much from the previous estimates because they are mostly dictated by models with two main effects (best models) - see image 8