Lecture 5 Flashcards
bayesian inference =
the outcome of a learning process that is governed by relative predictive success
bayesian learning cycle
prior knowledge - prediction - data - prediction error - knowledge update - prior knowledge…
wat is theta altijd
hypothesis, something that is unknown and we wish to learn about
wat is bayes rule
p(O|data) = P(O) x P(data|O)/P(Data)
posterior beliefs about parameters = prior beliefs about parameters x predictive updating factor
uit welke 2 componenten bestaat bayes rule
support en predictive success
formule support =
p(O|data) / P(O)
formule predictive succes =
p(data|O) / p(data)
wat vergelijk je bij support
wat je dacht over theta voordat je de data zag, en wat je nu weet over theta nadat je de data zag.
if the data increased the plausitibility of a value of theta: the data provided support for that value. en zo niet dan minder support
wat zie je bij predicitive success
what is the probability of the observed data? how surprising are the observed data vs how surprising are the observed data given a value of theta?
wat zegt bayes rule
dat support hetzelfde is als predictive succes
wat is surprise in statistics
a bad thing, you want the data to be predictable
dus less surprising data =
predictive success, wat betekent dat er support is
wat is het verschil tussen support en predicitive success
support = what happens to our beliefs
predictive success = how surprising is our data, how well was our prediction?
wat is een ezelsbruggetje
surprise lost is credibility gained.
wat is de dotted line
prior distribution on theta
vaak een rechte lijn omdat het makkelijk is, dus alle values are equally likely
wat is de niet dotted line
posterior distribution
wat zie je als je prior en posterior vergelijkt bijvoorbeeld
dat de waardes onder …. (waar dotted en niet dotted elkaar kruizen) less likely zijn geworden, en waardes die tussen die kruizing liggen zijn meer likely
the area under the curve…
needs to be one: dus als een area minder likely is, moet een andere wel meer likely zijn
in a continuous distribution, theta can take on any value from …
0 to 1
what is the probability in a continuous distribution
not the height, but the area under the curve
wat zegt de height
heeft niet echt een interpretatie. maar je kan wel kijken hoe plausibel het a priori was, en dan hoe het a posterior was. de ratio tussen deze twee -> how much more likely is a value in the posterior? dat is de increase in our belief
hoe reken je die ratio uit
de bovenste gedeeld door de onderste
posterior / prior
p(data) betekenis
probability of the data
if you assume the uniform prior distribution (if any proportion is equally likely a priori) -> then you predict that the number of successes is also uniform. dus alle outcomes are equally likely a priori, if you assume that all values of theta are equally likely a priori. your predictions are very broad
if you assume the specific theta (such as 0,50)….
you get much more specific predictions, peaked around bv 0.5.
hoe zou je p(O|data)/P(O) kunnen uitleggen
you compare the prior distribution to the posterior distribution, and then we can see how our beliefs have changed.
principle parsimony
when you prefer simple over complex explanations, unless the data forces you to abandon the simple explanation
posterior belief about theta formule =
p(O|data)
prior belief about theta formule =
p(O)
change in belief formule =
p(O|data)/p(O)
predictive adequacy of theta formule =
p(data|O)
average predictive adequacy formule =
p(data)
relative predictive adequacy of theta formule =
p(data|O) /p(data)
bij wat voor soort distribution weet je meer
bij een peaked distribution, more narrow
if the posterior distribution is narrower than the prior distribution, this indicates that the data have … the uncertainty
reduced
ratio a=
the probability of x being lower than [hypothesis], before you have seen the data
ratio b=
the probability of x being lower than [hypothesis], after you have seen the data
c =
the single most believable number, the highest point, the most likely value
d=
hoe veel meer plausibel is value c compared to d (=hypothesis)?
ratio d indicates that the value of …. is … times more probable than the value of [hypothesis]
interval e=
a central 95% credible interval which indicates that one can be 95% confident (the posterior probability is 95%) that bobs true iq falls within the interval ranging from … to …
what does it mean when a model underfits the data
it is too simple. fails to capture the patterns
wat is de consensus over variation
variation is random until the contrary is shown
simple models tend to make..
precise predictions
complex models need to spread out their predictions
which model is better supported by the data
the model that predicted the data best
posterior beliefs about hypotheses formule =
p(H1|data)/p(H0|data)
waarom is de predictive updating factor zo belangrijk
omdat dit het enige is waar mensen het over eens kunnen zijn, de posterior en prior beliefs van iedereen kunnen anders zijn maar we kunnen het er over eens zijn welk model het beter heeft voorspelt
dus wat is de predictive updating factor
which model predicted the data the best
bf 1-3
anecdotal
bf 3-10
moderate
bf 10-30
strong
bf 30-100
very strong
bf >100
extreme
4 advantages of the bayes factor
- quantifies evidence instead of forcing an all or nothing decision (kan gewoon zeggen de bayes factor of 3, en dan kunnen mensen zelf interpreteren hoe likely het is)
- discriminates ‘evidence of absence’ from ‘absence of evidence’
- Allows evidence to be monitored as data accumulate.
- Applies to data from the real world, for which no sampling plan can be articulated.
evidence of absence =
there is evidence for the null hypothesis, the data is much more likely under the null hypothesis.
absence of evidence =
bayes factor near one, there is no evidence. geen significant resultaat zegmaar
Allows evidence to be monitored as data accumulate. wat is heir anders dan bij frequentism
frequentism: meer data is ook alpha aanpassen enzo. bij bayes you are simply learning from the data
wat is de interpretation van BF01 = 3
the observed data are 3 times more likely under the null hypothesis than the H1
dus wat is GEEN goede interpretatie van BF01=3
after seeing the data, h0 is now 3 times more likely than h1. this is only correct when H0 and H1 are equally alike a priori
welke error hoor bij die verschillende interpretaties van BF01 = 3
the fallacy of transposing the conditional: Pr(A|B) =/= Pr(B|A)
dus wat is de overkoepelende gedachte van bayes factor
the bayes factor quantifies the relative predictive performance. it shows the degree to which the data should shift your opinion. it does not quantify that opinion itself!
Every confirmatory instance should increase
your confidence in the general law
oke
Simple models make daring
predictions; if these come
true, the simple model is
rewarded.
oke
if we have n subjects that have successes, what is the bayes factor
n +1