Lecture 11 - Significance and Power Flashcards

1
Q

Basic principles of Null Hypothesis Significance Testing

A

Assume H0 is true -> fit a model to data, get a test statistic -> calculate the probability of getting test statistic, assuming H0 is true (p)

Get test statistic by comparing amount of ‘signal’ to ‘noise’, or ‘systematic variation’ to ‘unsystematic variation’, or ‘effect’ to ‘error’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The misuse of NHST

A

Sometimes misused intentionally, sometimes unintentionally

The American Statistical Association (2016) outlined principles on the misuse of p values in significance testing, including:

(1) p-values are not measuring the probability of getting results by chance, or that a specific hypothesis is true (likelihood we get specific statistic if we assume null hypothesis is true in the first place)

(2) Statistical significance is not the same as practical importance

(3) The p-value alone is not a good measure of evidence regarding a model of hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Type I and Type II errors

A

Null hypothesis true in reality (population) and experiment result (sample) = accurate (unfortunate)

Null hypothesis true in reality (population) but alternative hypothesis is true in experimental result = type I error α (false positive)

Alternative hypothesis is true on population, but null hypothesis is true in experimental result = type II error β (false negative)

Alternative hypothesis true in population and sample = accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Power

A

P-value doesn’t tell you how likely you found a genuine effect (just probability of getting a particular test statistic)

The probability of finding an effect assuming one exists in the population

Calculated as 1-β

β is the probability of not finding the effect, and is usually 0.2 (Cohen, 1992)

We have an 80% chance of detecting an effect assuming it genuinely exists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Factors affecting power

A

Knowing three factors means you can figure out the 4th one

E.g. if know level of power want, alpha level and effect size, know how many participants needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Effect size

A

An objective and standardised measure of the magnitude of an effect

Larger value = bigger effect size

Depends on test conducted (Cohen’s d – t-tests, Pearson’s r – correlation, Partial eta squared – ANOVA)

The American Statistical Association (2016) recommends reporting this in results sections of reports

Look at previous research with effect sizes to know how many participants etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Number of participants

A

Rule of thumb: more participants = more ‘signal’ (because there is less sampling error), less ‘noise’

Less room for sample error

Should choose sample size depending on expected effect size

Larger effect size = fewer participants needed to get a ‘real’ effect

Smaller effect size = more participants needed to detect a ‘real’ effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Alpha level

A

Size of alpha = probability of obtaining a type I error

Compare p value to this alpha criteria when testing significance

If p value is less than alpha criteria, results are significant

E.g. set alpha of .05 and run study 100 times, expect to make a type I error 5 times

Trade off – if you want to decrease type I error rate, you naturally increase type II error rate, and vice versa

Choice of alpha depends on specific research area/previous research

Many studies/research use .05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Other factors

A

Variability, design, test choice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Problems with alpha testing

A

If we run multiple tests, this will increase the rate at which we might get a type I error, also known as a Familywise experimental error rate

Can account for this by limiting the number of tests, or by using corrections such as Bonferroni correction

Dividing original alpha criterion by number of comparisons we want to make

But this reduces statistical power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

One-tailed test

A

We hypothesise there will be a difference in scores, and we’re specific about which score will be higher (α=.05 at one end). Directional hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Two-tailed test

A

We hypothesise there will be a difference in scores, but this could be in either direction (α= .025 at both ends). Non-directional hypothesis.

This impacts data interpretation because for a one-tailed hypothesis/test, our p-value is half of the two-tailed p-value

Argued that one-tailed test is more powerful as it is more likely to detect a significant effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why does our p-value change?

A

Problem with one-tailed tests: if you obtain a significant test statistic but it was in the other direction, you must not reject the null hypothesis (can encourage cheating)

Two-tailed hypothesis = assesses likelihood in both directions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which type of test do I run?

A

One-tailed tests are more powerful as α is higher

One-tailed = larger alpha = less likelihood of making a type II error

However, there are several caveats and considerations to this

Can allow people to cheat in research and analysis (p-hacking)

In most cases, it is recommended that you run a two-tailed test so we can explain results in either direction they go in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Power and study design

A

Within-subjects studies are more powerful than between-subjects studies

But depends on type of study being conducted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is statistical power an important concept in inferential statistics?

A

Might want to do two things:

Calculate the power we have obtained in a study post-hoc (use no. of participants, effect size and alpha level)

Calculate how many participants we need to collect for a study a priori (this can be done using statistics programs such as G*Power)

Cannot calculate power using SPSS