lecture 1 - effect size and power Flashcards

1
Q

what is null hypothesis significance testing

A

a method of statistical inference by which an experimental factor is tested against a hypothesis of no effect or no relationship based on a given observation

NHST is a statistical method for testing whether there is enough evidence in a data sample to infer that a particular condition or effect exists in the larger population, its a way to decide between two competing hypothesis : null and alternative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is
-null hypothesis
-alternative hypothesis

A

Null Hypothesis ( ): This is the default assumption or claim that there is no effect, difference, or relationship in the population. For example:
“There is no difference in mean scores between two groups’

Alternative Hypothesis ( ): This is the competing claim that there is an effect, difference, or relationship. For example:
“The mean score of group A is greater than that of group B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the rationale for null hypothesis significance testing

A

Researcher has a research question

 Formulates a null hypothesis (there is no effect) and an alternative
hypothesis (there is an effect).

 Collects data (sample from population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

type 2 error

A

-there is a difference byt you fail to detect it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

if the data :
provides
does not provide
evidence against the null hypothesis

A

If the data provide sufficient evidence against the null hypothesis:
◼ Rejects the null hypothesis
◼ Adopts the alternative hypothesis instead

 If the data does not provide sufficient evidence against the null hypothesis
◼ Rejects the alternative hypothesis
◼ But it does not necessarily mean that the null hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

problem with NH
-why NH is unrealistic in real world, impossibility for two groups to have the same score
-A null-hypothesis of H0: μa-μb=0 is a hypothetical construct

A

in the real world, it’s almost impossible for two groups to have exactly the same score. There will always be some tiny differences because of random chance or natural variation.

A non-significant result should never be interpreted as ‘no difference
between means’ or ‘no relationship between variables’.
If the test result isn’t significant, it doesn’t mean there’s absolutely no difference between the groups. It just means the difference is so small that, with the data we collected, we couldn’t be sure it wasn’t just random noise.

A non-significant result only tells us that the effect is not large enough to be detected with the given sample size. If we had a bigger sample (more data), we might be able to detect even small differences. A small sample might miss these subtle effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

problems with NHST
-not possible to demonstrate the null hypothesis

A

Not possible to demonstrate the null hypothesis

a non-significant result could be due to the null-hypothesis being true OR a
failure to gather sufficient evidence
→ Researchers must set up their research so that the ‘desired’ outcome is to reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

problems with NHST
-statistical significance is not practical significance

A

Statistical significance is not practical significance

 with a sufficiently large sample, very small effects can become statistically
significant, although they may be unimportant for any practical purpose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

practical significance : a fictitious example

A

 IQ is measured in >1000 participants
 Statistical tests indicate that one gender has a higher IQ than the other
(p<0.05).
 The actual difference in group means is 0.8 IQ points
 Although the difference is statistically significant, it is practically irrelevant:
it is not informative of the IQ of any individual person, because the variance
within groups is much larger than the difference between groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

problems with NHST

A

All-or-nothing thinking
 If p < .05 then an effect is significant, but if p > .05, it is not.
 One would reach completely opposite conclusions depending on whether p
= .0499 or p = .0501.
 However, these p-values only differ by 0.0002.
 They would reflect basically the same-sized effect.
→ Alpha level is arbitrary (result: many published papers with values
just below 0.05)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does significant mean

A

In statistics, ‘significance’ implies that something is unlikely to have
occurred by chance (and may therefore have a systematic cause)

 What is considered to be ‘unlikely’ depends on an arbitrarily defined
significance threshold

 Psychology: α=0.05 (= a 1 in 20 chance)
 Physics: 5σ criterion (α=0.000000286), a 1 in 3.5 million chance
 A critical perspective: significance at a 5% threshold indicates limited
evidence that the data is not entirely random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are alternative to NHST

A

-no clear replacement currently available
-proposed : effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

effect size

A

Provides an estimate of the size of group differences or the effect of
treatment

 Ideally independent of the size of the sample

Effect size is a measure of the magnitude or strength of a difference or relationship in a study, beyond just whether it is statistically significant. While statistical significance tells us if an effect exists, effect size tells us how big or meaningful that effect is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the uses of effect size

A
  • Measure of how large an effect is (p- or t- or F-value will not tell this)

-used in estimating the sample size needed for sufficient statistical power

-used when combining data across studied (meta analysis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

types of effect size

A

 Group difference indices (e.g., Cohen’s d)
 Strength of association (‘variance explained’, e.g., eta squared, R
squared)
 Risk estimates (e.g., relative risk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

effect size
-group differences

A

Examples:
 Males versus females
 Treatment versus control group
 Young versus older participants

17
Q

difference between population mean and sample means

A

population mean is normally unknown, so sample mean can be used to get a good approximation

18
Q

how to use sample mean to get effect size

A

sample means : m1-m2
eg effect size = 180-165 = 15

19
Q

what is a disadvantage of using differnce in means for effect size

A

Disadvantage: Measure is dependent on measurement scale

20
Q

standardised mean difference

A

sigma

-we dont know the population means, but we can use the sample means
-what about sigma? - Various methods to estimate sigma, leading
to different effect size measures

21
Q

group difference indices

A

-cohens d
-glass’ delta
-hedge’s d

Measures differ on how the population variance is estimated from the data

22
Q

cohens d

A

-most commonly reported

SDpooled

SDpooled = root of ..

23
Q

hedge’s g

A

-very similar to cohens d
-Measures differ on how the population variance is estimated from the data

24
Q

Glass’ delta

A

Glass’ delta uses the standard deviation from the control group rather than the pooled standard deviation from both groups.

 Glass’ delta is often used when several treatments are compared to
the control group.

25
Q

paired samples t test

A

A paired samples
𝑡t-test (also called a dependent samples
𝑡t-test) is a statistical test used to compare the means of two related groups to see if there is a significant difference between them. The groups are “paired” because the same individuals or entities are measured twice under different conditions or at different times.

26
Q

classification of effect size -cohens d

A

Classification of effect size:

 d between 0.2 and 0.49 = small
 d between 0.5 and 0.79 = medium
 d of 0.8 and higher = large