L5 - Critical thinking about statistical inference Flashcards
List of things from last block/lectures we need to know so if you don’t remember, revise them
In some I included an answer in brackets as well if the book described it nicely in few word
- Null hypothesis and alternative hypothesis, the difference between the two (null is the one most costly to reject falsely), formulating those refers to sample properties, not population properties
-
T-statistic, probabilities (to calculate any we need a collective which can be constructed by assuming H0, imagining an infinite number of experiments, calculating t each time which is the single event of the collective)
↪ T-distirbution (the distribution of the infinite number of ts in the collective) -
p-value and α (they are objective probabilities, a relative long-run frequencies)
↪ Neither α nor p tell us how probable the null hypothesis is (they are not P(H|D) ) - β, power (1-β; P(reject H0|H0 false) )
- sensitivity, specificity
- Type I error (we will make this error in α proportion of our decisions; P(rejecting H0|H0 true) ), Type II error (P(accepting H0|H0 false)
How do we control β that we determine before collecting the data
- estimating effect size we’re interested in
- estimate data variance
↪ Do this based on knowledge from past studies about the same concept or do a pilot study
Determining these two, a table can tell us how many participants we need to have to keep β at our predetermined level
Look at picture 6
Consider the statements about the study and say whether they are true or not
All of them are incorrect. Throught the flashcards it should become clear why that is the case. We’ll also revisit them at the end and explain why they are incorrect
What is important to remember about the alpha level and the p-value when interpreting statistical results?
The p-value and alpha level say something about the probability of the null hypothesis, they don’t refer to the alternative hypothesis at all.
This is a common misunderstanding of the definition of p value in statistical inference that leads people to make fallacious conclusions about their results
What does the misinterpretation of the p-value help explain?
It helps explain the motivation of people to obtain significant results
- Explains why peer-review process is often focused on checking whether the results were significant instead of focusing on the content of the paper
- Explains why issues with reproductibility occur (e.g. rounding p-value of 0.051 to 0.05)
What are different statements that people use to report non-significant results (p>0.05) to show it’s almost significant
Not important to remember them, just as an example to understand the lengths that people can go to, to mislead the reader into thinking the results are significant
- a certain trend toward significance (p=0.08)
- approached the borderline of significance (p=0.07)
- just very slightly missed the significance level (p=0.086)
- near-marginal significance (p=0.18)
- only slightly non-significant (p=0.0738)
- provisionally significant (p=0.073)
- quasi-significant (p=0.09)
What is the analogy of the conflict that goes on in researchers’ heads when they find non-significant results?
It’s a silly example, no need to remember, he included it in the lecture more for fun than for actual learning
Picture 1
What is the point of using p-value when it forces people to seek significant results at all costs?
Playing the devil’s advocate: how likely is (at least) this statistic if there were no difference in the population?
- What if I’m not measuring a systematic difference in the population, but just random variation? → Is the difference to be expected if there is nothing else going on but, for example, random sampling?
If there was actually nothing going on, the probabolity of me finding this result or more extreme is not that high
How did p-value come about? What did Fisher propose?
Significance testing!
- Formulate H0: the hypothesis to be ‘nullified’
- Report the exact level of significance (p-value), without further discussion about accepting or rejecting hypotheses (for the reader to decide how they want to interpret this value)
- Only do this if you know almost nothing about the subject
↪ ‘‘A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance
What did Neyman & Pearson suggest as an alternative to Fisher’s approach?
They thought that Fisher’s steps less useful, as no clear alternative is specified
Hypothesis testing!
- Formulate two statistical hypotheses, determine alpha, beta & sample size for the experiment, in a deliberate way (expected x value) before collecting the data
- If the data falls in the H1 rejection region, assume H2. This does not mean that you believe H2, only that you behave as if H2 is true
- Only use this procedure if there is a clear disjunction & if a cost-benefit assessment is possible
So basically, we’re setting behavioural rules: even though we don’t know whether the H0 is true or not, we won’t be wrong very often if it is true and we won’t be wrong very often if it is false.
We can also put this is a frequency tree (picture 2)
Since the two approaches didn’t agree with each other, what did we end up with? What is the issue with this?
We ended up with the null ritual
- Set up a statistical null hypothesis of “no mean difference” or “zero correlation.” Don’t specify the predictions of your research hypothesis or of any alternative substantive hypotheses
- Use 5% as a convention for rejecting the null. If significant, accept your research hypothesis
- Always perform this procedure
This approach also introduces many fallacies - we will dicuss these in the next block
What are 4 fallacies in statistical inference?
- P-values equal the probability that the (null)hypothesis is true
- Alpha equals the probability of making an error
- Failing to reject H0 is evidence for H0
- Power is irrelevant when results are significant
P-values equal the probability that the (null)hypothesis is true
Which probability do statements, in which alpha, power and p-value occur, relate to?
Statements in which alpha, power and p-values occur relate to:
1. frequentist or objective probabilities
2. Conditional probabilities
1. Frequentist probability
What is subjective probability?
Probability is the degree of belief that something is the case in the world
- This expresses a degree of uncertainty: e.g. how sure are you that you have chosen the right answer to an MC question?
1. Frequentist probability
What is objective probability?
Probability is the extent to which something IS the case in the world
- These probabilities exist independently of our states of knowledge
- This, for example, expresses the relative frequency in the long term: e.g, infinite number of coin tosses (reference class or ccollective)
- Probabilities need to be discovered by examining the world, not by reflecting on what we know or how much we believe
What is a reference class or collective?
The hypothetical infinite set of events and the long-run relative frequency is a property of all the events in the collective, not to any single event
- Might be the set of all potential tosses of a coin using a certain tossing mechanism →a single toss of a coin (singular event) doesn’t have a probability; only the collective of tosses has a probability
1.Frequentist probability
What is the reason, according to the frequentist probabilities, why we cannot infer the probability of the null hypothesis from the p-value?
Null hypothesis is either true or it’s not; just as a single event either occurs or doesn’t
- Hypothesis is not a collective, hence it’s not an objective probability
- With p-values (Fisher) and the Neyman-Pearson paradigm we talk about objective probability