L.9 - Bayesian ANOVA Flashcards
Predictive quality
how well did the model/parameter predict the data?
- use predictive quality to update knowledge about the world, and use that knowledge to make new predictions
general knowledge about Bayesian statistics
- prior beliefs about parameters
- prior beliefs about hypothesis
- fully embrace hypothesis, not just reject and not reject
Bayesian Parameter Estimation
what is a statistical model?
- makes statements of what are going to be likely values of a parameter (predict one or more values)
- through probability distribution
what does* image 1* show?
- statistical models of probability distributions
- they have put infinite probability mass on one point each
- points: point models, point hypotheses
- make statement about what data will be equal to (sarah: 0.5, paul: 0.8)
what is theta in statistical models?
- parameter estimations
- value that model indicates as the hypothesized result of experiment
what does* image 2* show?
- likely outcomes of data under the initial models
- sampling distribution (sarah’s model here = null hypothesis)
- how likely each outcome is under each model
> e.g. in sarah’s model P(5)=0.25, in paul’s model P(5)=0.04)
try to understand the models by looking at the image
how can we compute a one-sided or two-sided t test by looking at the models?
(image 2)
- one-sided t-test: sum the probabilities of getting an 8, 9 and 10
- two-sided t-test: sum the probabilities of getting 0, 1, 2, 8, 9, 10
- we are looking at Sarah’s model (it corresponds to the Null model)
Uniform models
- models were bets are more split
- Alex’s model: all values of theta are equally likely (probability distribution)
- this is called uniform distribution
- see image 3
One-sided models
- only posits values on one side of the distribution
- still relatively specific prediction
- in Betty’s model, coin is biased towards heads
- more specific than Alex’s model, but less specific than Sarah’s model
- see image 4
what are these models called in the Bayesian framework?
- prior distributions
- every model has assigned prior probabilities to all values of theta
- in bayesian statistics, we update these prior distributions
- some models “learn” more or less than others, depending on what they predict
> e.g. Paul’s model will learn less than Betty’s model (better to start with uniform distribution)
what distributions is the best for a binomial test?
- the Beta distribution
- it has a domain of [0,1] (same domain as a proportion)
what is the likelihood function?
- tells out how likely we are to get 8 heads out of 10, for different values of theta
- likelihoods are based on binomial formulas
> calculate probability of observing 8/10 heads if theta is equal to any value (e.g. 0.5) - k= 8, n=10
- see image 5, and try to understand the formula
! important things to remember about the likelihood function
! NOT a probability distribution! (surface area does not sum to 1)
> we cannot make probabilistic statements (do that only with prior and posterior distributions
! same function regardless of the model
how can we use the likelihood distribution?
- we can use this distribution to see which values of theta are a good match for our observed data
- if likelihood is high, does values of theta match the data well (predicted data well)
- we want to reward values of theta that predicted the data well
> we give them a boost in plausibility
how can we determine which values should receive an increase or decrease in plausibility after observing the data?
through marginal likelihood
Marginal Likelihood → P(data)
- dependent on the prior model
- average likelihood across all the values predicted by the model
- single value per model
- sees all values in the model and computes average likelihood of all the values
- see image 6
what does the marginal likelihood tell us?
- on average, how did the prior model predict the data?
- what values were predicted better/worse than average?
> in* image 6*, it shows how well Alex’s model predicted the data
how does the marginal likelihood differ across the different models?
- Alex: all the values between 0 and 1
- Betty: all the values between 0.5 and 1
- Sarah: likelihood at 0.5
- Paul: likelihood at 0.8
what are the marginal likelihoods in the models?
(look at the images)
- the yellow bars represent the marginal likelihood
- give us the probability of observing 8/10 heads if this model is right model
! how likely are data under this model, on average, across all values in that model
how do we use the marginal likelihood (m.l.) to determine what values get a boost in plausibility?
- when looking at the graph of marginal likelihood, the values that are over the average m.l. will get a boost in plausibility
- values worse than average will receive penalty in plausibility
for which values of theta is likelihood better/higher than marginal likelihood?
- see picture 7
posterior distribution - what can we see from the graph?
- see image 8
- posterior distribution is a lot higher than prior distribution for values that predicted better than average
posterior distribution - things to remember!
- probability distribution
- we can use this to make probabilistic statements about a parameter
Central credible interval
- take middlemost 95% in posterior distribution
- credible interval tells us how likely the interval is to contain true value
what is the credible interval in Alex’s model? what does it mean?
- from 0.48 to 0.94
- if Alex’s model is true model, there is 95% probability of theta being between 0.48 and 0.94
- this is probabilistic statement about true value of the parameter
! conditional to model we are working with !
(different credible interval for different prior distribution models)
sensitivity analysis - robustness check
- does confidence interval widely if I tweak my prior distribution?
- how robust is my conclusion over two different prior distributions?
Take home messages (p.1)
- Bayesian quantify uncertainty through distributions
- the more peaked the distribution, the lower the uncertainty
> but those very certain models don’t learn very well - incoming infomation continually updates our knowlege
> today’s posterior is tomorrow’s prior