Week 3: Hypothesis Testing and Significance Flashcards

1
Q

What is the main goal of hypothesis testing?

A

To assess how compatible sample data are with a certain hypothesis about the population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two types of hypothesis in hypothesis testing?

A

Null hypothesis (H0): Assumes no effect or no difference (e.g., μ = 0).
Alternative hypothesis (H1): Assumes an effect or difference exists (e.g., μ ≠ 0).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the steps in hypothesis testing?

A
  1. Define the null hypothesis
  2. Define the alternative hypothesis
  3. Choose a significance level (α)
  4. Select the correct statistical test and calculate the test statistic
  5. Find the critical value or p-value
  6. Interpret the results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the significance level (α), and what does it represent?

A

The probability of a Type I error (rejecting H0 when it is true), commonly set at 0.05.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between a one-sided and two-sided test?

A

One-sided: Tests if a parameter is either greater or less than a critical value
Two-sided: Tests if a parameter is different from a certain value, without specifying direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Type I and Type II errors?

A

Type I error (α): Rejecting H0 when it is true
Type II error (β): Failing to reject H0 when H1 is true
Note: Setting a lower significance level decreases a Type I error risk, but increases a Type II error risk. A Type I error is often considered to be more important to avoid. The hypothesis test procedure is therefore adjusted so that:
α = P(type I error) = significance level = 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is a test statistic calculated for a hypothesis test?

A

By comparing the sample estimate to the null hypothesis value, scaled by its standard error. In other words, the number of SEs a sample estimate is away from H0. It shows the size of the estimate relative to its precision.
Example:
z = x̄ - μ / SE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a p-value?

A

The probability of obtaining a result as extreme or more extreme than the observed result, assuming H0 is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you interpret a p-value?

A

p < 0.05: Evidence against H0 at the 5% level
p ≥ 0.05: Insufficient evidence against H0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the relationship between p-values and CIs?

A

Both describe the role of chance in the data. If a CI excludes the null value, the corresponding p-value is less than the significance level.
The p-value is the probability that a sample is consistent with H0. Unlike CIs, it gives little indication of the likely size of the population parameter.
The CI is a range of intervals within which the true population parameter is likely to lie.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the critical value in hypothesis testing?

A

A threshold used to compare the test statistic to determine whether to reject H0. It depends on α and the type of test (one-sided or two-sided)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is statistical significance not always practical significance?

A

Large samples can detect trivial effects as statistically significance, while small samples may miss meaningful effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is “p-hacking” and why is it problematic?

A

Manipulating data or analyses to obtain significant p-values, leading to misleading or non-reproducible results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How should p-values be used in reporting?

A

Treat p-values as a continuum and present them alongside CIs and study context (one piece of evidence alongside other factors e.g., prior evidence, study design, data quality, real-world costs and benefits, etc.
P-values represent the probability of the data given H0, not the probability of H0 given the data. Avoid using binary language to interpret p values (reject/don’t reject) in favour of no/weak/strong/some evidence against H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you decide based on a 95% CI?

A

If the CI contains the null value, there is no evidence against H0.
If the CI excludes the null value, there is evidence against H0.
Note: We can use 95% CI for one sample tests. However, for two-sample tests, overlapping CIs does not preclude lack of support for H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are common misinterpretations of p-values?

A
  • No evidence of difference does not mean there is evidence of no difference.
  • No evidence against H0 is not sufficient to state H0 is true.
  • Thinking significance tests are about individuals, when they’re really about central tendency of the population (mean, proportions).
  • Thinking tests cannot be statistically significant by chance. By probability, 5% of the time you will get a value in the critical region, even if H0 is true. It is misleading to report results only if ‘statistically significant’. True effects may not be as large as initial estimates, e.g., due to publication bias.
  • Lack of “statistical significance” equates with a lack of evidence for an effect
    (false)
  • The p-value is the probability that the H0 is true. The p-value assumes the H0 is true; it indicates the degree to which the data conform to the pattern predicted by H0.
  • Misinterpreting p < 0.05 as proving H0 is false/to be rejected. Rather, a discrepancy from H0 would be as large or larger than that observed by no more than 5% of the time if only chance were creating the discrepancy.
  • Misinterpreting p > 0.05 as proving H0 is true/to be accepted. See point above.
  • A statistically significant result is scientifically or substantively important, indicating large effect size. Rather, when a study is large, very minor effects can lead to significant results; CIs should be used to determine which effect sizes of scientific importance are compatible with the data.
17
Q

Why are one-sided tests controversial?

A

They yield smaller p-values and require clear justification before testing, as they only consider one direction of effect.

18
Q

What are key limitations of hypothesis testing?

A
  • Results depend on sample size. When n is small, large deviations from H0 may go undetected. Where n is very large, tiny deviations from H0 may be found to be statistically significant. CIs are more informative.
  • Tests (especially p values) can indicate statistical significance without practical relevance.
  • Selective reporting can lead to publication bias.
19
Q

What is the “power” of a test?

A

The probability of correctly rejecting H0, when H1 is true (1 - β). Some tests may not be powerful enough to detect an effect.

20
Q

What steps should be taken to avoid “p-hacking”?

A
  • Pre-register hypotheses and analysis plans
  • Avoid selective reporting
  • Consider the study context, prior evidence, and data quality
21
Q

How should p-values and CIs complement each other?

A

Use p-values to assess evidence against H0, and CIs to gauge the magnitude and direction of the effect.

22
Q

What is a test statistic?

A

Quantity based on sample data and H0 used to test between H0 and H1.

23
Q

What is meant by rejection?

A

Values of the test statistic used to assess how ‘strong’ the evidence is against H0.

24
Q

What are examples of hypothesis tests?

A

Hypothesis test for a mean i.e., how does the sample mean compare to some pre-specified value?
Hypothesis test for a difference in mean i.e., is the mean in group 1 different from that in group 2?

25
Q

Outline the logic underlying hypothesis testing:

A
  • Look for evidence against H0 by evaluating its consistency with the data
  • If the sample estimate is consistent with H0, then we have evidence in favour of H0
  • If the sample estimate is not consistent with H0, then we have evidence against H0
    Note: We cannot prove or disprove a hypothesis
26
Q

Step 1 - Null hypothesis:

A

Assumption of no difference in the population from which the sample is drawn:
Mean: μ = μ0 (usually μ0 = 0)
Difference in means: δ = μ1 - μ2 = 0

27
Q

Step 2 - Alternative Hypothesis:

A

H1 is a statement of what a statistical hypothesis test is set up to establish.
Assumption that a difference exists in the population:
Mean: μ ≠ 0 or specified value μ0
Difference in means: δ = μ1 - μ2 ≠ 0 (or H1 can be directional - e.g. μ > 0)

28
Q

What will the choice of the test statistic depend on?

A

The assumed probability model and the hypotheses under study.
Commonly used: z-score (mean, proportion); t-statistic, chi-square; F-statistic

29
Q

What is the logic underpinning a 5% significance level?

A

If the test is two-sided, 95% of the z-scores fall within 1.96 SE away from the mean.
If the test is one-sided, the critical value is (+ or -) 1.645 (95% of the distribution lies below/above z=1.645).

30
Q

What does a large test statistic indicate?

A

Evidence against H0. For a mean -
>1.96 or <-1.96 data do not support H0 at 95% confidence level

31
Q

Do not fall into this trap when n is small…

A

You shouldn’t assume that because n is small, you cannot believe any results of a significance tests. Or conversely, that because n is large, the results must be true.
Statistical tests account for sample size when calculating probabilities and CIs. Study design (including sampling and method of randomisation) is critical: Statistical analysis cannot remedy basic flaws in data production, such as voluntary response samples, uncontrolled experiments, and measurement bias.