L.9 - Bayesian ANOVA Flashcards

1
Q

Predictive quality

A

how well did the model/parameter predict the data?
- use predictive quality to update knowledge about the world, and use that knowledge to make new predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

general knowledge about Bayesian statistics

A
  • prior beliefs about parameters
  • prior beliefs about hypothesis
  • fully embrace hypothesis, not just reject and not reject
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Bayesian Parameter Estimation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a statistical model?

A
  • makes statements of what are going to be likely values of a parameter (predict one or more values)
  • through probability distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does* image 1* show?

A
  • statistical models of probability distributions
  • they have put infinite probability mass on one point each
  • points: point models, point hypotheses
  • make statement about what data will be equal to (sarah: 0.5, paul: 0.8)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is theta in statistical models?

A
  • parameter estimations
  • value that model indicates as the hypothesized result of experiment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does* image 2* show?

A
  • likely outcomes of data under the initial models
  • sampling distribution (sarah’s model here = null hypothesis)
  • how likely each outcome is under each model
    > e.g. in sarah’s model P(5)=0.25, in paul’s model P(5)=0.04)

try to understand the models by looking at the image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how can we compute a one-sided or two-sided t test by looking at the models?

A

(image 2)
- one-sided t-test: sum the probabilities of getting an 8, 9 and 10
- two-sided t-test: sum the probabilities of getting 0, 1, 2, 8, 9, 10
- we are looking at Sarah’s model (it corresponds to the Null model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Uniform models

A
  • models were bets are more split
  • Alex’s model: all values of theta are equally likely (probability distribution)
  • this is called uniform distribution
  • see image 3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

One-sided models

A
  • only posits values on one side of the distribution
  • still relatively specific prediction
  • in Betty’s model, coin is biased towards heads
  • more specific than Alex’s model, but less specific than Sarah’s model
  • see image 4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are these models called in the Bayesian framework?

A
  • prior distributions
  • every model has assigned prior probabilities to all values of theta
  • in bayesian statistics, we update these prior distributions
  • some models “learn” more or less than others, depending on what they predict
    > e.g. Paul’s model will learn less than Betty’s model (better to start with uniform distribution)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what distributions is the best for a binomial test?

A
  • the Beta distribution
  • it has a domain of [0,1] (same domain as a proportion)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the likelihood function?

A
  • tells out how likely we are to get 8 heads out of 10, for different values of theta
  • likelihoods are based on binomial formulas
    > calculate probability of observing 8/10 heads if theta is equal to any value (e.g. 0.5)
  • k= 8, n=10
  • see image 5, and try to understand the formula
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

! important things to remember about the likelihood function

A

! NOT a probability distribution! (surface area does not sum to 1)
> we cannot make probabilistic statements (do that only with prior and posterior distributions
! same function regardless of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how can we use the likelihood distribution?

A
  • we can use this distribution to see which values of theta are a good match for our observed data
  • if likelihood is high, does values of theta match the data well (predicted data well)
  • we want to reward values of theta that predicted the data well
    > we give them a boost in plausibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how can we determine which values should receive an increase or decrease in plausibility after observing the data?

A

through marginal likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Marginal Likelihood → P(data)

A
  • dependent on the prior model
  • average likelihood across all the values predicted by the model
  • single value per model
  • sees all values in the model and computes average likelihood of all the values
  • see image 6
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what does the marginal likelihood tell us?

A
  • on average, how did the prior model predict the data?
  • what values were predicted better/worse than average?
    > in* image 6*, it shows how well Alex’s model predicted the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how does the marginal likelihood differ across the different models?

A
  • Alex: all the values between 0 and 1
  • Betty: all the values between 0.5 and 1
  • Sarah: likelihood at 0.5
  • Paul: likelihood at 0.8
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are the marginal likelihoods in the models?
(look at the images)

A
  • the yellow bars represent the marginal likelihood
  • give us the probability of observing 8/10 heads if this model is right model
    ! how likely are data under this model, on average, across all values in that model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

how do we use the marginal likelihood (m.l.) to determine what values get a boost in plausibility?

A
  • when looking at the graph of marginal likelihood, the values that are over the average m.l. will get a boost in plausibility
  • values worse than average will receive penalty in plausibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

for which values of theta is likelihood better/higher than marginal likelihood?

A
  • see picture 7
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

posterior distribution - what can we see from the graph?

A
  • see image 8
  • posterior distribution is a lot higher than prior distribution for values that predicted better than average
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

posterior distribution - things to remember!

A
  • probability distribution
  • we can use this to make probabilistic statements about a parameter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Central credible interval

A
  • take middlemost 95% in posterior distribution
  • credible interval tells us how likely the interval is to contain true value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what is the credible interval in Alex’s model? what does it mean?

A
  • from 0.48 to 0.94
  • if Alex’s model is true model, there is 95% probability of theta being between 0.48 and 0.94
  • this is probabilistic statement about true value of the parameter

! conditional to model we are working with !
(different credible interval for different prior distribution models)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

sensitivity analysis - robustness check

A
  • does confidence interval widely if I tweak my prior distribution?
  • how robust is my conclusion over two different prior distributions?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Take home messages (p.1)

A
  • Bayesian quantify uncertainty through distributions
  • the more peaked the distribution, the lower the uncertainty
    > but those very certain models don’t learn very well
  • incoming infomation continually updates our knowlege
    > today’s posterior is tomorrow’s prior
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Bayesian Hypothesis Testing

A
30
Q

H0 and Ha in Bayesian statistics

A
  • H0: null hypothesis; no significant difference (theta= 0.5)
  • Ha: considers multiple values of theta

> just two different models with marginal likelihoods

31
Q

What is the formula for Bayesian theorem in Ho and Ha terms?

A
  • see image 9
  • through this formula, we update our hypothesis (which one is more likely?)
32
Q

What is the Bayes Factor? (BF)

A
  • updating factor to go from prior beliefs about hypothesis to posterior beliefs
  • ratio of marginal likelihood (compare m.l. of Ha to m.l. of H0)
  • single number that quantifies evidence in favour of one hypothesis over another
33
Q

how can we calculate the Bayes factor?

A
  • ratio between the probability of that data under all the values in our Alternative hypothesis and probability of data under all values of Null hypothesis
  • B.F. = P(data|Ha) / P(data|H0)
34
Q

what does a BF10 of 3 mean?

A

BF10 = 3
- 1: Ha
- 0: H0
= the data are 3 times more likely under alternative model than under null model

! the greater the BF, the more evidence we have in favour of one hypothesis over the other

35
Q
  • how can we compute the BF in Sarah’s vs Alex’s model?
  • how do we interpret the BF that we obtain?
A

1a) how likely is getting 8 heads under all the values in Sarah’s model in average?
> 0.04
1b) how likely is getting 8 heads under all the values in Alex’s model in average?
> 0.09
2) we calculate ratio of these marginal likelihoods
> BFsa: 0.04/0.09 = 0.44
→ the data are 0.44 times more likely under Sarah’s model than under Alex’s model
> BFas: 1/0.44 = 2.25 (or 0.09/0.04)
→ the data are 2.25 times more likely under Alex’s model than under Sarah’s model

36
Q

What are the rules to interpret Bayes factors?

A

BF → evidence

1-3 → anecdotal
3 - 10 → moderate
10 - 30 → strong
30 - 100 → very strong
>100 → extreme
! these are just guidelines, small differences between BFs are not important
! usually, we look at BFs over 10 (Johnny’s opinion)

37
Q

how can BFs be represented?

A
  • see image 10
  • try to understand the graphs!
38
Q

what are the advantages of the Bayes factor?

A
  • it is a continuous assessment of evidence in favour of one or the other hypothesis
    > no black and white reasoning about statistical significance (such as in frequentist stats)
  • it allows to monitor BF as we gather data
    > not possible with frequentist stats
  • differentiates between the evidence of absence and the absence of evidence
39
Q

Bayesian ANOVA

A
39
Q

Evidence of absence vs Absence of evidence

A
  • evidence of absence: data supports H0
  • absence of evidence: data are not informative (BF close to 1)
40
Q

ANOVA in regression formula
- what does each value represent?

A
  • yi = b0 + b1*xi
  • yi: observed variable
  • b0: intercept
  • b1: group difference (regression weight)
  • xi: tells us whether we are predicting for control or experimental condition
41
Q

what does b1 represent?

A
  • group difference
  • it is the parameter of interest
  • if b1=0 → no group difference
42
Q

what is the Null and Alternative hypotheses in terms of the ANOVA regression formula?

A
  • H0: b1=0
    > H0: there is no difference between conditions
  • Ha: b1≠0
    > there is a difference between conditions
43
Q

what are the models to test one independent variable with two conditions?

A
  • see image 11
  • different from frequentist statistics; here we specify distribution for alternative model
    → we specify probability distribution for b1
44
Q

what are the domains of Beta and b1?

A
  • Beta: [0,1]
  • b1 [-∞,∞]
    → we have to use distribution that matches domain of b1
45
Q

what distribution can we use?
- what is it? when is it used?

A
  • Cauchy distribution
    > prior distribution
    > t-distribution with one single degree of freedom
    >conventionally used when talking about a difference in means
  • see image 12
46
Q

what does the cauchy distribution allow?

A
  • it allows b1 to take any value
  • important compared to null model
  • see image 12
47
Q

What do both models show?
(image 12)

A
  • both the null and alternative model show predictions of how the data would look like if that model was right
  • each model has its own marginal likelihood
    > what is the average likelihood under this model for the observed data?
48
Q

what is the marginal likelihood for the alternative model?

A
  • we can calculate m.l. in same way as with binomial test, but now we condition on different values of our regression weight
  • see image 13
49
Q

what does likelihood of the graph show?
(see image 13)

A
  • data are more likely for b1 values close to 0.8-0.9
  • data are very unlikely for b1 values close to 0 (eg)
50
Q

what does the marginal likelihood for our alternative model (M1) and null model (M0) show?
(see image 13&14)

A

! average likelihood of data for each of the values in a specific model
> dependent on model we have
- in M0, the marginal likelihood is close to 0 because that is the only b1 value that our null model predicts

51
Q

how can we interpret marginal likelihoods?

A
  • likelihood around zero is incredibly low
  • marginal likelihood of M0 is also very low (only b1 predicted is zero)
    ! we must compare marginal likelihood between two models to interpret it
52
Q

how do we calculate the BF based on marginal likelihood?

A
  • P(data|M1) / P(data|M0)
  • see image 15
53
Q

how can we calculate Bayesian ANOVA with 2 independent variables and 2 groups each?
- how are the models?

A
  • see picture 16
  • tastiness = b0 + b1alcoholic + b2correct
  • M0: no effect of alcohol of beers in tastiness ratings and no effect of being correct on tastiness ratings
  • Ma: model with main effect of alcohol (b1)
  • Mc: model with main effect of correctness (b2)
  • Ma+c: model with intercept b0 and two main effects
    ! we compute factorial ANOVA
54
Q

how does Bayesian factorial ANOVA differ from frequentist factorial ANOVA?

A
  • bayesian factorial ANOVA constructs 4 models, and calculates how well each model predicts the data, across all values in that specific model
55
Q

how does each model differ in factorial vs one-way Bayesian ANOVA?

A
  • now each model has two prior distributions (one per predictor)
    > in this case, each model has a prior distribution for alcohol, and one for correctness
  • see image 17, 18 & 19
56
Q

JASP- Bayesian paired-sample t-test

A
  1. Bayesian paired-sample t-test
    > to see whether there is a difference between the alcoholic vs non-alcoholic ratings
    - see image 20
57
Q

how would the distribution change if we have a one-sided alternative hypothesis?

A
  • for Ha: alcoholic beer is tastier
    > see image 21
  • for Ha: non-alcoholic beer is tastier
    >* see image 22*
    > now evidence in favour of null hypothesis (alternative model does worse than null)
    !! side of alternative hypothesis matters !!
58
Q

Robustness check

A
  • we can assess robustness of test with “Robustness check” under “Plots”
  • e.g. would I have a completely different result if I had used a width of 1 instead of .7?
    -* see image 23*
59
Q

Bayesian Repeated Mesures ANOVA

A
  • basically a paired-samples t-test
  • comparing two within-subject groups
    > alcoholiness of the beer is repeated measure
    > we add correctness as between-subject variable
    = we get different types of Bayes factors
  • see image 24
60
Q

How can we interpret the BF10 in the table?

A
  • BF10: compares one model to other model
  • it compares each model specified in the row, to the model that predicted the best (highest marginal likelihood)
  • JASP puts the model with highest marginal likelihood in the first row
  • BF10 in first row is always 1 (compare model with itself)
  • see image 25
61
Q

In image 25, how can we interpret the results?

A
  • “Alcoholic + correctness” model predicts data the best
    > therefore the BF is 1 (compares that model to itself)
  • data are 0.8 times as likely under the “Alcoholic” model compared to the model with two main effects
  • in the last row, we can see that last two models predicted data so much worse than first model
62
Q

“compare to null” option

A
  • under “order” option in JASP
  • through this option, the table is re-ordered
    > null model is in first row
    > each BF compares model in the row and null model
63
Q

model average

A
  • instead of quantifying evidence in favour of individual models, we can look at quality of prediction of all models containing one effect, and compare those to models that don’t have that effect
  • click under “effects” option in JASP and get “analysis of effects” table
    > we get every effect in our design (e.g. alcoholic and correctedness) and we have BFincl
  • see picture 26
64
Q

what are BFincl?

A
  • Inclusion bayes factors
  • they quantify evidence in favour of including specific effect
  • last column in table
  • they compare groups of models with groups of other models
65
Q

how do you interpret BFincl in our example of beer tasting?

A
  • e.g. compare all models that include alcohol and all models that don’t include alcohol
  • data are 100,000 times more likely under model with alcohol included
  • models with correctedness → absence of evidence (low BF)
65
Q

information button

A
  • blue “i” button
  • describes all settings and output
  • you can use it in the exam, but should still practice so that you don’t waste time
66
Q

what is the main pitfall of all the analyses we have done so far?

A

!!- we must pay attention to interaction effects
- under “models”, we click CTRL on keyboard and select the two components, then drag interaction in “model terms”
-* see image 27*

67
Q

how do we interpret the interaction effect in our example?

A
  • no big difference for the alcoholic beers whether people are correct or not
  • big difference for non-alcoholic beers whether people are correct or not
    > people rate beers differently based on whether they are correctly identifying the beers
    = when incorrect, no difference in rating between alcoholic and nonalcoholic beers
    = when correct, there is huge effect between non-alcoholic and alcoholic beer
    ! we can also flip graph to make it clear
67
Q

how can we plot interaction effects in JASP?

A
  • in “descriptives”:
    > we put “alcoholic” under horizontal axis
    > + credible interval
    > put “correctedness” under separate lines
  • see image 28
    ! always visualize your data !
67
Q

Interaction effects in our example

A
  • now it is in the first row, which means that it is the model that predicts data the best
68
Q

Posterior, graphs

A
  • see image 29 & 30