Lecture 14: Critical thinking about statistical inference Flashcards
mindless statistic gaat over…
people are convinced that p values are the final say, if something is not p <0.05, it is not worth anything????
statistics = probability -> nothing is certain!!
oke
(1) You have absolutely disproved the null hypothesis (i.e., there is no difference between the population means).
(2) You have found the probability of the null hypothesis being true.
(3) You have absolutely proved your experimental hypothesis (that there is a difference between the population means).
(4) You can deduce the probability of the experimental hypothesis being true.
(5) You know, if you decide to reject the null hypothesis, the probability that you are making the wrong decision.
(6) You have a reliable experimental finding in the sense that if, hypothetically,the experiment were repeated a great number of times, you would obtain a significant result on 99% of occasions.
wat is er mis met ii, iv, v, en vi?
Options (ii) and (iv) may have been initially tempting, but hopefully you see now that as they refer to the probability of hypotheses they cannot be correct (an objective probability refers to a collective of events, not the truth of a hypothesis).
Option (v) is a sneaky one and often catches people out; but notice it refers to the probability of a single decision being correct; thus, option (v) cannot be correct (an objective probability does not refer to a single event) .
But option (vi) is a description of power, not significance.
wat waren de mahony resultaten
mensen gaven prijzingen bij p<0.05, en waren meer kritisch bij p>0.05
waardoor komen de mahony resultaten misschien
door misinterpretatie van de p value
why do we use NHST at all?
weneed to understand whether the data-gathering process is:
- just randomness? (=sampling error and measurement error)
- just systematic variation?
- or both?
welke rol neem je aan bij NHST
die van de devils advocate: if you think there is an effect, lets assume that there is no difference. is the random variation expected if there is nothing going on?
wat was het process van significance testing according to Fischer
- formulate H0: the hypothesis is to be nullified (dus niet perse 0, maar wij hebben dat er wel van gemaakt!).
- report the exact level of significance -> p value. without further discussion about accepting or rejecting hypotheses.
- only do this process if you know almost nothing about the subject. en het had een serie moeten zijn aan experimenten: als het repeatedly fails to give this level of significance.
hypothesis testing according to neyman and pearson
- formulate two statistical hypotheses, determine alpha, beta and N for the experiment
- if the data falls in the H1 rejection region, assume H2. this does not mean that you believe H2, only behave as if H2 is true.
- only use this procedure if there is a clear disjunction and if a cost-benefit assessment is possible.
dus verschil fisscher en neyman & pearson
fisscher zei niks over hypothesis testing, alleen het rapporteren van de p value.
lower alpha means lower power
oke
wat is de relatie tussen alpha en beta
inverse relationship between alpha and beta—lowering one increases the other.
wat gebeurt er als je alpha lager maakt
If you lower your alpha from 5% to 1%, you are simultaneously increasing the probability of making a Type II error, assuming all else is equal. Increasing the probability of a Type II error reduces the power of your test.
dus a en b zijn negatief gecorreleerd. power = 1-B en zal dus ook omlaag gaan als de alpha omlaag gaat.
the null ritual =
(niet wat zij wilden!!)
- set up a H0 -> no mean difference/no correlation. dont specify the predictions of your research etc.
- use 5% as a convention for rejecting the H0
- always perform this procedure
wat zijn 3 denkfouten aan de null ritual
- straw target: arguing against something that someone actually did not say (= zero correlation ding van Fisher)
- common practice: 5% en always perform this procedure
- false dilemma: if significant, always accept your research hypothesis
we have reduced our scientific thinking to just checking significance
oke
subjective probability=
probability is the degree of belief that something is the case in the world. (=uncertainty!)
objective probability=
probability is the extent to which something is the case in the world (= relative frequency in the long term)
wat zijn twee karakteristieken over p value die te maken hebben met probabilities
- gaat over objective/frequentist probabilities
- relates to conditional probabilities
waarom kunnen we het niet hebben over “the probability of the null hypothesis”
omdat de p value een conditional probability is!!!
p(at least this test statistic | H0)
dus als je …. ziet staan over de p value is het soweiso fout
iets met een statement over de nulhypothese! we weten alleen iets over de test statistic GIVEN the H0. dus die kan je niet zomaar omkeren!!!
wat gebeurt er dus qua argumentvorm als je iets zegt over de H0
P -> B
B
P
dit is inversing the conditional probability + base rate fallacy (we ignore the base rate)
hoe kunnen we deze probability wel inversen
we hebben daar de base rate voor nodig.
we would need a base rate on the null hypothesis: a hypothesis is true or false, there cannot be assigned a probability to this. there is nothing there!!!
wat is hier mis mee: alpha equals the probability of making an error
because we have two errors: type 1 and type 2 error
alpha is conditioned on…
the null being true.
a= P(what is the probability of making an error | H0 is true)
wat gaat hier mis;
failing to reject the H0 is evidence for the H0
you should have a very very high power to claim this. net als met sleutels zoeken: heb je wel goed gezocht?
failing to reject H0 is evidence for H0: 3 dingen die misgaan bij deze redenatie
- je kan niet spreken van evidence, we hebben daar 2 hypothesen voor nodig.
- je moet een power calculation doen om te kijken of je dit wel kan zeggen
- power is de functie van: population size effect, alpha level en sample size. hangt dus van veel dingen af.
power is the .. side of the distribution under the alternative hypothesis
right side
als je een studie repliceert, waar moet je dan echt op letten
dat je niet hetzelfde aantal participanten gebruikt, want dan kan je studie underpowered zijn. dit komt namelijk doordat de power dan de rechterkant van de distributie is (=0.50 -> geen hoge power).
the winners curse=
if you win an auction, you always pay to much (compared to the true value and to what other people are willing to pay.
hoe kan je de winners curse zien in publication bias
maybe a lot of ppl are doing effect size estimates, but a lot are unsignificant. the ones that do find a significant value, get published -> overestimation of the effect.
power is irrelevant when the results are significant: wat is hier fout aan
- P1 P<.05
- C I have found an effect
- (P2 When I have found an effect, it is no longer relevant what the probability is that I find an effect if H0 is not true)
- C2 Power is not relevant
hier zeg je iets over de alternatieve hypothese: je impliceert dat deze dus sowieso waar is. dat kan je niet zeggen, wij weten alleen iets over de H0 en dat die gereject kan worden.
you did not find an effect, you just rejected the null.
QRP’s=
questionable research practices
the garden of forking paths=
it is difficult to get a sense of how things would have turned out if something else changed, because of randomness
neyman and pearson depends on…
the stopping rule you apply. this is difficult, because it is subjective.
bayesian statistics praten over…
subjective probabilities, the degree of belief! je start hiermee met een prior belief en na data vinden -> allocate the credibility
waarom kan je bij normale conditionele probabilities niet zeggen P (A|B) = P(B|A) ?
omdat we geen base rate hebben. dat kunnen we dus niet zeggen
waarom kan je bij bayes wel dit berekenen P (A|B) => P(B|A) ?
omdat we hier wel een base rate hebben, die geven we er zelf namelijk aan.
formule bayes
P(θ|D) = P(D|θ) * P(θ) / P(D)
P(θ)=
prior beliefs
P(D|θ)=
likelihood, the probability of the data given that a specific hypothesis is true
dus wat vraag je je eigenlijk af bij de likelihood
hoe predictive zijn mijn verschillende hypothesen?
P(D)=
how likely are the found data, given all the probabilities
bayesian statistics is reallocation of … across …
credibility across possibilities
credibility=
subjective probability or degree of belief
possibilities=
how likely are different parameter θ values, given the data and our prior beliefs?
hoe bereken je P(D|θ)
θ^D
dus bijvoorbeeld:
θ=0.1
D= 5 van de 5
P(D|θ) = 0.1^5
dus wat is de likelihood
P(D|θ)
= the probability of the data given the hypothesis
wat geeft de likelihood je
de distribution of the likelihood over the full range of the possible parameter values
bayes factor formule =
P(D|θ1) / P(D|θ2)
hoe heet de bayes factor ook wel
likelihood ratio, how much more probable is the data given one comparison, compared to the other? welke hypothese is more likely?
posterior interpretation
degree to which you belief the hypothesis to be true, given the data
wat gebruik je voor prior als je voor het experiment nog geen idee hebt
een flat prior (bij exploratory research)
wat is een flat prior
each possible prior van θ is equally likely
hoe narrower the peak van de prior…
hoe meer certain over θ
wat is de relatie tussen P(θ|D) en P(D|θ)*P(θ)
dit is proportioneel!!!
als P(θ|D) 2 keer omhoog gaat, gaat P(D|θ)*P(θ) ook 2 keer omhoog!!!
dus posterior is proportioneel aan likelihood * prior
P(D)=
marginal likelihood, hoe likely is deze data, vergeleken met de data voor alle andere mogelijkheden?
wat is lastig aan bayes
- subjectief
- no error rate control
(v) You know that if you decided to reject the null hypothesis, the probability that you are making the wrong
decision.
wat is hier mis mee?
Option (v) is a sneaky one and often catches people out; but notice it refers to the probability of a single decision being correct; thus, option (v) cannot be correct (an objective probability does not refer to a single event) .