L.9 - Bayesian ANOVA Flashcards
Predictive quality
how well did the model/parameter predict the data?
- use predictive quality to update knowledge about the world, and use that knowledge to make new predictions
general knowledge about Bayesian statistics
- prior beliefs about parameters
- prior beliefs about hypothesis
- fully embrace hypothesis, not just reject and not reject
Bayesian Parameter Estimation
what is a statistical model?
- makes statements of what are going to be likely values of a parameter (predict one or more values)
- through probability distribution
what does* image 1* show?
- statistical models of probability distributions
- they have put infinite probability mass on one point each
- points: point models, point hypotheses
- make statement about what data will be equal to (sarah: 0.5, paul: 0.8)
what is theta in statistical models?
- parameter estimations
- value that model indicates as the hypothesized result of experiment
what does* image 2* show?
- likely outcomes of data under the initial models
- sampling distribution (sarah’s model here = null hypothesis)
- how likely each outcome is under each model
> e.g. in sarah’s model P(5)=0.25, in paul’s model P(5)=0.04)
try to understand the models by looking at the image
how can we compute a one-sided or two-sided t test by looking at the models?
(image 2)
- one-sided t-test: sum the probabilities of getting an 8, 9 and 10
- two-sided t-test: sum the probabilities of getting 0, 1, 2, 8, 9, 10
- we are looking at Sarah’s model (it corresponds to the Null model)
Uniform models
- models were bets are more split
- Alex’s model: all values of theta are equally likely (probability distribution)
- this is called uniform distribution
- see image 3
One-sided models
- only posits values on one side of the distribution
- still relatively specific prediction
- in Betty’s model, coin is biased towards heads
- more specific than Alex’s model, but less specific than Sarah’s model
- see image 4
what are these models called in the Bayesian framework?
- prior distributions
- every model has assigned prior probabilities to all values of theta
- in bayesian statistics, we update these prior distributions
- some models “learn” more or less than others, depending on what they predict
> e.g. Paul’s model will learn less than Betty’s model (better to start with uniform distribution)
what distributions is the best for a binomial test?
- the Beta distribution
- it has a domain of [0,1] (same domain as a proportion)
what is the likelihood function?
- tells out how likely we are to get 8 heads out of 10, for different values of theta
- likelihoods are based on binomial formulas
> calculate probability of observing 8/10 heads if theta is equal to any value (e.g. 0.5) - k= 8, n=10
- see image 5, and try to understand the formula
! important things to remember about the likelihood function
! NOT a probability distribution! (surface area does not sum to 1)
> we cannot make probabilistic statements (do that only with prior and posterior distributions
! same function regardless of the model
how can we use the likelihood distribution?
- we can use this distribution to see which values of theta are a good match for our observed data
- if likelihood is high, does values of theta match the data well (predicted data well)
- we want to reward values of theta that predicted the data well
> we give them a boost in plausibility
how can we determine which values should receive an increase or decrease in plausibility after observing the data?
through marginal likelihood
Marginal Likelihood → P(data)
- dependent on the prior model
- average likelihood across all the values predicted by the model
- single value per model
- sees all values in the model and computes average likelihood of all the values
- see image 6
what does the marginal likelihood tell us?
- on average, how did the prior model predict the data?
- what values were predicted better/worse than average?
> in* image 6*, it shows how well Alex’s model predicted the data
how does the marginal likelihood differ across the different models?
- Alex: all the values between 0 and 1
- Betty: all the values between 0.5 and 1
- Sarah: likelihood at 0.5
- Paul: likelihood at 0.8
what are the marginal likelihoods in the models?
(look at the images)
- the yellow bars represent the marginal likelihood
- give us the probability of observing 8/10 heads if this model is right model
! how likely are data under this model, on average, across all values in that model
how do we use the marginal likelihood (m.l.) to determine what values get a boost in plausibility?
- when looking at the graph of marginal likelihood, the values that are over the average m.l. will get a boost in plausibility
- values worse than average will receive penalty in plausibility
for which values of theta is likelihood better/higher than marginal likelihood?
- see picture 7
posterior distribution - what can we see from the graph?
- see image 8
- posterior distribution is a lot higher than prior distribution for values that predicted better than average
posterior distribution - things to remember!
- probability distribution
- we can use this to make probabilistic statements about a parameter
Central credible interval
- take middlemost 95% in posterior distribution
- credible interval tells us how likely the interval is to contain true value
what is the credible interval in Alex’s model? what does it mean?
- from 0.48 to 0.94
- if Alex’s model is true model, there is 95% probability of theta being between 0.48 and 0.94
- this is probabilistic statement about true value of the parameter
! conditional to model we are working with !
(different credible interval for different prior distribution models)
sensitivity analysis - robustness check
- does confidence interval widely if I tweak my prior distribution?
- how robust is my conclusion over two different prior distributions?
Take home messages (p.1)
- Bayesian quantify uncertainty through distributions
- the more peaked the distribution, the lower the uncertainty
> but those very certain models don’t learn very well - incoming infomation continually updates our knowlege
> today’s posterior is tomorrow’s prior
Bayesian Hypothesis Testing
H0 and Ha in Bayesian statistics
- H0: null hypothesis; no significant difference (theta= 0.5)
- Ha: considers multiple values of theta
> just two different models with marginal likelihoods
What is the formula for Bayesian theorem in Ho and Ha terms?
- see image 9
- through this formula, we update our hypothesis (which one is more likely?)
What is the Bayes Factor? (BF)
- updating factor to go from prior beliefs about hypothesis to posterior beliefs
- ratio of marginal likelihood (compare m.l. of Ha to m.l. of H0)
- single number that quantifies evidence in favour of one hypothesis over another
how can we calculate the Bayes factor?
- ratio between the probability of that data under all the values in our Alternative hypothesis and probability of data under all values of Null hypothesis
- B.F. = P(data|Ha) / P(data|H0)
what does a BF10 of 3 mean?
BF10 = 3
- 1: Ha
- 0: H0
= the data are 3 times more likely under alternative model than under null model
! the greater the BF, the more evidence we have in favour of one hypothesis over the other
- how can we compute the BF in Sarah’s vs Alex’s model?
- how do we interpret the BF that we obtain?
1a) how likely is getting 8 heads under all the values in Sarah’s model in average?
> 0.04
1b) how likely is getting 8 heads under all the values in Alex’s model in average?
> 0.09
2) we calculate ratio of these marginal likelihoods
> BFsa: 0.04/0.09 = 0.44
→ the data are 0.44 times more likely under Sarah’s model than under Alex’s model
> BFas: 1/0.44 = 2.25 (or 0.09/0.04)
→ the data are 2.25 times more likely under Alex’s model than under Sarah’s model
What are the rules to interpret Bayes factors?
BF → evidence
1-3 → anecdotal
3 - 10 → moderate
10 - 30 → strong
30 - 100 → very strong
>100 → extreme
! these are just guidelines, small differences between BFs are not important
! usually, we look at BFs over 10 (Johnny’s opinion)
how can BFs be represented?
- see image 10
- try to understand the graphs!
what are the advantages of the Bayes factor?
- it is a continuous assessment of evidence in favour of one or the other hypothesis
> no black and white reasoning about statistical significance (such as in frequentist stats) - it allows to monitor BF as we gather data
> not possible with frequentist stats - differentiates between the evidence of absence and the absence of evidence
Bayesian ANOVA
Evidence of absence vs Absence of evidence
- evidence of absence: data supports H0
- absence of evidence: data are not informative (BF close to 1)
ANOVA in regression formula
- what does each value represent?
- yi = b0 + b1*xi
- yi: observed variable
- b0: intercept
- b1: group difference (regression weight)
- xi: tells us whether we are predicting for control or experimental condition
what does b1 represent?
- group difference
- it is the parameter of interest
- if b1=0 → no group difference
what is the Null and Alternative hypotheses in terms of the ANOVA regression formula?
- H0: b1=0
> H0: there is no difference between conditions - Ha: b1≠0
> there is a difference between conditions
what are the models to test one independent variable with two conditions?
- see image 11
- different from frequentist statistics; here we specify distribution for alternative model
→ we specify probability distribution for b1
what are the domains of Beta and b1?
- Beta: [0,1]
- b1 [-∞,∞]
→ we have to use distribution that matches domain of b1
what distribution can we use?
- what is it? when is it used?
- Cauchy distribution
> prior distribution
> t-distribution with one single degree of freedom
>conventionally used when talking about a difference in means - see image 12
what does the cauchy distribution allow?
- it allows b1 to take any value
- important compared to null model
- see image 12
What do both models show?
(image 12)
- both the null and alternative model show predictions of how the data would look like if that model was right
- each model has its own marginal likelihood
> what is the average likelihood under this model for the observed data?
what is the marginal likelihood for the alternative model?
- we can calculate m.l. in same way as with binomial test, but now we condition on different values of our regression weight
- see image 13
what does likelihood of the graph show?
(see image 13)
- data are more likely for b1 values close to 0.8-0.9
- data are very unlikely for b1 values close to 0 (eg)
what does the marginal likelihood for our alternative model (M1) and null model (M0) show?
(see image 13&14)
! average likelihood of data for each of the values in a specific model
> dependent on model we have
- in M0, the marginal likelihood is close to 0 because that is the only b1 value that our null model predicts
how can we interpret marginal likelihoods?
- likelihood around zero is incredibly low
- marginal likelihood of M0 is also very low (only b1 predicted is zero)
! we must compare marginal likelihood between two models to interpret it
how do we calculate the BF based on marginal likelihood?
- P(data|M1) / P(data|M0)
- see image 15
how can we calculate Bayesian ANOVA with 2 independent variables and 2 groups each?
- how are the models?
- see picture 16
- tastiness = b0 + b1alcoholic + b2correct
- M0: no effect of alcohol of beers in tastiness ratings and no effect of being correct on tastiness ratings
- Ma: model with main effect of alcohol (b1)
- Mc: model with main effect of correctness (b2)
- Ma+c: model with intercept b0 and two main effects
! we compute factorial ANOVA
how does Bayesian factorial ANOVA differ from frequentist factorial ANOVA?
- bayesian factorial ANOVA constructs 4 models, and calculates how well each model predicts the data, across all values in that specific model
how does each model differ in factorial vs one-way Bayesian ANOVA?
- now each model has two prior distributions (one per predictor)
> in this case, each model has a prior distribution for alcohol, and one for correctness - see image 17, 18 & 19
JASP- Bayesian paired-sample t-test
- Bayesian paired-sample t-test
> to see whether there is a difference between the alcoholic vs non-alcoholic ratings
- see image 20
how would the distribution change if we have a one-sided alternative hypothesis?
- for Ha: alcoholic beer is tastier
> see image 21 - for Ha: non-alcoholic beer is tastier
>* see image 22*
> now evidence in favour of null hypothesis (alternative model does worse than null)
!! side of alternative hypothesis matters !!
Robustness check
- we can assess robustness of test with “Robustness check” under “Plots”
- e.g. would I have a completely different result if I had used a width of 1 instead of .7?
-* see image 23*
Bayesian Repeated Mesures ANOVA
- basically a paired-samples t-test
- comparing two within-subject groups
> alcoholiness of the beer is repeated measure
> we add correctness as between-subject variable
= we get different types of Bayes factors - see image 24
How can we interpret the BF10 in the table?
- BF10: compares one model to other model
- it compares each model specified in the row, to the model that predicted the best (highest marginal likelihood)
- JASP puts the model with highest marginal likelihood in the first row
- BF10 in first row is always 1 (compare model with itself)
- see image 25
In image 25, how can we interpret the results?
- “Alcoholic + correctness” model predicts data the best
> therefore the BF is 1 (compares that model to itself) - data are 0.8 times as likely under the “Alcoholic” model compared to the model with two main effects
- in the last row, we can see that last two models predicted data so much worse than first model
“compare to null” option
- under “order” option in JASP
- through this option, the table is re-ordered
> null model is in first row
> each BF compares model in the row and null model
model average
- instead of quantifying evidence in favour of individual models, we can look at quality of prediction of all models containing one effect, and compare those to models that don’t have that effect
- click under “effects” option in JASP and get “analysis of effects” table
> we get every effect in our design (e.g. alcoholic and correctedness) and we have BFincl - see picture 26
what are BFincl?
- Inclusion bayes factors
- they quantify evidence in favour of including specific effect
- last column in table
- they compare groups of models with groups of other models
how do you interpret BFincl in our example of beer tasting?
- e.g. compare all models that include alcohol and all models that don’t include alcohol
- data are 100,000 times more likely under model with alcohol included
- models with correctedness → absence of evidence (low BF)
information button
- blue “i” button
- describes all settings and output
- you can use it in the exam, but should still practice so that you don’t waste time
what is the main pitfall of all the analyses we have done so far?
!!- we must pay attention to interaction effects
- under “models”, we click CTRL on keyboard and select the two components, then drag interaction in “model terms”
-* see image 27*
how do we interpret the interaction effect in our example?
- no big difference for the alcoholic beers whether people are correct or not
- big difference for non-alcoholic beers whether people are correct or not
> people rate beers differently based on whether they are correctly identifying the beers
= when incorrect, no difference in rating between alcoholic and nonalcoholic beers
= when correct, there is huge effect between non-alcoholic and alcoholic beer
! we can also flip graph to make it clear
how can we plot interaction effects in JASP?
- in “descriptives”:
> we put “alcoholic” under horizontal axis
> + credible interval
> put “correctedness” under separate lines -
see image 28
! always visualize your data !
Interaction effects in our example
- now it is in the first row, which means that it is the model that predicts data the best
Posterior, graphs
- see image 29 & 30