L.9 - Bayesian ANOVA chapters Flashcards
Predictive quality
- how well did model, or parameter value, predict the observed data?
- we use this predictive quality to update our knowledge about the world
> we use updated knowledge to make predictions about tomorrow’s world - see Bayesian learning circle in image 1
what are the values of parameters across different models?
- θ → binomial test
- δ → t-test
- ρ → correlation
Cauchy distribution
- when do we use it?
- why?
- difference in means is on a continuous scale without hard bounds
> as opposed to correlation or proportion - we use this distribution to characterize each model’s prediction
how can we write a t-test model as a linear regression?
y1 = b0 + b1*xi
- do the beer tastings differ meaningfully if the beer is alcoholic or not?
- is b1=0?
how can we write down the hypothesis of the Cauchy distribution?
H0: b1 = 0
H1: b1 ~ Cauchy(0.707)
- 0.707 is the set scale of Cauchy’s distribution
what is the 0.707 in the Cauchy distribution?
- the alternative model bets around 50% on values between -0.707 and 0.707
- this number is conventional, we don’t really change it
how can we interpret the hypothesis of the Cauchy’s distribution?
- H0 goes all-in on 0 being the true value
- H1 spreads its bets across a range of values of b1
> how likely are the data under each hypothesis?
priors in ANOVA
- in expanding ANOVA (more than one effect or more than 2 groups), we add “b” for each parameter of the model
- each parameter will need a prior distribution, to make concrete what the model is predicting
> average quality of a model’s prediction is its marginal likelihood - see images 2 & 3
how can we use JASP for the example above?
- repeated measures ANOVA
- between (correctedness) and within (alcohol & non-alcohol)
- see image 4
P(M)
- P(M): prior model probability
> how likely is each model, before seeing the data?
> usually we divide the odds between models (here 0.25 each)
→ BeerType model: before looking at the data, there is a 25% probability that this model is the true model, out of these four models - see image 5
P(M|data)
- P(M|data): posterior model probability
> how likely is each model, after seeing the data?
> all values sum to 1
→ BeerType model: after looking at the data, there is a 44.6% probability that this model is the true model, out of all these models - see image 5
BFM
- posterior model odds
> updating factor from prior to posterior model probability
→ for BeerType model, we calculate it through posterior and prior odds
→ the data are 2.42 times more likely under this model, than under all the other models combined
BF10
- pairwise Bayes factor
> how likely are the data under this model, compared to another model?
> to compute BF10, we take ratio of posterior model probability of specific model divided by posterior model probability of best model
→ BeerType model: the data are 0.81 times more likely under this model, than under the best model
BFM vs BF10
- BFM: comparison between single model and all other models combines
- BF10: pairwise comparisons between single models
Bayes factor transitivity
- we can use BF10 column to conduct additional model comparisons
- e.g. we can compare BeerType model directly to Null model
→ we take those two models’ BF10 values and divide them by each other (beertype BF10 / null BF10)
→ the data are 97296 times more likely under beertype model than under null model
analysis of effects
- we can compare groups of models, instead of comparing single models with each other
- we can look at each predictor variable, and how well the models including that predictor, predicted the data
→ e.g. (beertype & beertype+correctness) vs (null & correctness) - see image 6
P(incl)
- prior inclusion probability
> we sum all the prior model probabilities of the models that included the predictor
→ correctedness is included in two models that each have a prior model probability fo 0.25, so its prior inclusion probability is 0.5
P(incl|data)
- posterior inclusion probability
> we sum all the posterior model probabilities of the models that include this predictor
→ correctedness is included in 2 models that have a posterior model probability of 0.563 and something very small, so its posterior inclusion probability is approximately 0.563
BFincl
- inclusion Bayes factor
> quantifies the change from prior inclusion probability to posterior inclusion probability for each component
→ for correctedness, it is aorund 1.24
! this is most important column
! it quantifies how well the models with a certain predictor do, compared to models without
→ much evidence for models with beertype, less for models with correctedness
→ this is absence of evidence (for correctedness)
single model inference
- we can specify specific model and geet estimates of various parameters (its b’s)
- see image 7
what does the table of Single Model Inference show?
- mean estimates (and credible intervals) for the group differences
- it characterizes what this specific model predicts for certain participants in certain situations
- we sum the values of intercept, and the values of the two conditions we are interested in
Person tasting a non-alcoholic beer, who correctly identified it
48.4 + (-9.64) + (-4.15) = 34.55
person tasting non-alcoholic beer, incorrectly identifies
48.4 + (-9.64) + (4.15) = 42.85
Model Averaging
- we can combine all the models into a single prediction by model averaging
- this method weighs each model’s prediction by their posterior model probability
! these estimates do not differ much from the previous estimates because they are mostly dictated by models with two main effects (best models) - see image 8
Error Percentage
- running the same bayesian ANOVA twice will lead to slight fluctuation in BFs
→ the fluctuation is indicated by “error %” column in JASP - these percentages indicate how much the BF will fluctuate from run to run
- usually error below 20 is acceptable (BFs will deviate between 8 and 12)
how does the Bayesian paradigm differ from the frequentist paradigm?
(part 1)
- evidence in favour of a particular model is a continuous measure of support
> no need for all-or-none bayesian factor cut-off points to accept/reject particular model - Bayes factor can discriminate between absence of evidence (data predicted equally under both models) and evidence of absence (data support the null hypothesis)
how does the Bayesian paradigm differ from the frequentist paradigm?
(part 2)
- in Bayesian paradigm, the knowledge about models M and parameters Beta is updated simultaneously
> we account for model uncertainty by considering all models, but weighting more the ones that predict data well
> known as “bayesian model averaging” - frequentists first select a ‘best’ model and then estimate its parameters
→ neglect model uncertainty and produce overconfident conclusions
how does the Bayesian paradigm differ from the frequentist paradigm?
(part 3)
- Bayesian posterior distributions allow for direct probabilistic statements about parameters
- range of parameters is called credible interval (usually 95%)
> we can consider any interval from a to b and quantify our confidence that the parameter falls in that specific range
how does the Bayesian paradigm differ from the frequentist paradigm?
(part 4)
- Bayesian inference automatically penalizes for complexity and favours parsimony
> e.g. a model with a redundant covariate will make poor predictions
→ BF will favour model without the redundant predictor
Important summary!
the predictive performance is assessed using parameter values that are drawn from the prior distributions
mega summary of frequentist vs bayesian
- In frequentist ANOVA, the comparison of variances between and within groups is key, and the result is evaluated using a p-value based on the F-distribution. If the null hypothesis is true, variances between and within groups should be similar.
- In Bayesian ANOVA, instead of comparing variances, model predictions are evaluated based on prior distributions
Summary of Bayesian Model Averaging
- Bayesian Model Averaging (BMA) is a statistical approach that deals with uncertainty in model selection
- Instead of picking just one “best” model, BMA looks at all possible models and combines their results.
- It does this by weighing each model’s estimates according to how likely that model is (its “posterior probability”).
- This way, BMA accounts for uncertainty and gives more reliable results compared to traditional methods, which only focus on a single model and ignore the possibility that another model might also be valid.