NHST Flashcards

1
Q

sampling variation

A

every sample drawn at random from a population will be composed of different individuals and therefore have different means
- difference between means is termed sampling variation or sampling error

two samples drawn from a single population may occasionally have quite different means
- therefore possible to obtain a statistically significant t result, even though the two samples are from the same population
- false positive result, type 1 error

two samples drawn from quite different populations may occasionally have quite similar means
- possible to obtain a non-sig result, even though there’s a real difference between the two pops
- false negative result, or type 2 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

type 1 error

A

false positive results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

type 2 error

A

false negative results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

error is unavoidable

A

NHST generates a p value, which is probability that we would have obtained our observed data if H0 is true

if p= 0.02, there’s only a 2% probability that we would have obtained our data if H0 is true, so we reject H0

if p= 0.1, there’s a 10% probability that we would have obtained our data if H0 is true, so we fail to reject H0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

alpha and beta

A

alpha (false positive): acceptable probability of a type one error
- usually alpha= 0.05
- accept we will make a type 1 error up to 5% of the time

beta (false negative): acceptable probability of a type 2 error
- usually b= 0.20
- make a type 2 error up to 20% of the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

significant result

A

likely to lead to follow up studies, investment of considerable time and money which is wasted if based on type 1 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

false negative result

A
  • interesting results get missed
  • less serious
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

visualizing alpha

A

NHST: directly tests the null hypothesis, not the alternate
H0: single testable numerical predication
H1: doesn’t predict a single testable numerical prediction
- infinite values for which =/ 0 is true

t statistics in the trial are unlikely to be obtained if H0 is true
- if H0 is true, and observed data is in the tail= type 1 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

visualizing beta

A

generate a distribution of t statistics that would be obtained if H1 is rue

have to pick a specific size of the effect that we are expecting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

effect size

A

Cohen’s d: continous data consisting of two groups (T TESTS)

eta squared: continuous data consisting of >2 groups (ANOVA)

partial eta squared: continuous data with >1 predictor variable (factorial ANOVA or multiple regression)

Pearson’s r: relationship between two continuous variables (correlation or regression)

R^2: continuous data with a continuous or categorical predictor (correlation, regression or ANOVA)

odds ratio (OR): categorical data (uses X^2 or logistic regression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

general properties of effect size

A

quantifies the size of the effect of the predictor variable on the outcome variable (effect of x on y)

effect size is generally not affected by sample size
- large sample sizes do increase the probability of a statistically significant result
- large sample sizes do not systematically affect the effect size

the larger the effect size associated with the predictor variable, the easier it will be to obtain a statically-significant result
- probability of a false negative error will be reduced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Cohen’s d

A

simpler measure of effect size, used for continuous data consisting of two group
- appropriate for data analyzed using paired and independent t-tests

expresses the difference between group means as the number of standard deviations between the means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Cohen’s d: repeated measures

A

expressed the difference between condition means (D-bar) as the number of standard deviations (Sd) between the means

d= D-bar/ Sd

Sd: average value of Di- D-bar
- average residual or error from GLM
- if d=2, the difference between the conditions is twice the average error or residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cohen’s d: independent groups

A

difference between group means (y-bar1 - y-bar0) as the number of standard deviations (Sp) between the means

d= (y-bar1 - y-bar0)/ Sp

Sp: pooled standard deviation and is the average difference between each score (y1 or y0) and the group mean
- average residual or error from GLM
- if Sp=2, the difference between the group means is twice as large as the average error/residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

impacts on Cohen’s d

A

greater between difference of means = the greater Cohen’s D

Sp gets smaller= Cohen’s d gets larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

interpreting Cohen’s d

A

small effect: d<0.2
medium effect: 0.2 <d<0.8
large effect: d>0.8

17
Q

Cohen’s d vs t

A

d: divides by the average difference between each score and the mean (unaffected by n)

t: divides by the average difference between each mean of the sampling distribution and the distribution mean (affected by n)

18
Q

visualizing beta

A

calculate the t distribution based on H0 (shows alpha)
- gives range of t stats that we would expect if H0 was true

randomly sample scores per group from two normally-distributed population
then calculate t statistic
repeat million times to generate distribution that would be expected if H1 was true with d= + 0.8

19
Q

power

A

beta: probability of obtaining a negative result where H1 is true
- statistical analyses focus on power vs beta
- probability of a true positive result

aim for beta<0.2, power > 80%

20
Q

3 ways to increase power to be 0.8

A
  1. make it easier to obtain a sig result by reducing alpha
    - reduces the probability of a false negative, but will increase probability of false positive result
  2. increase sample size
    - reduce the standard error, and therefore the standard deviation of our probability distributions
  3. change H1 by increasing expected effect size
21
Q

changing n impact

A

increasing n reduces standard error
- result in larger value of t ( if mean of distribution isn’t 0)
- mean of beta distribution will be shifted away from alpha distribution

changing n, changes df and alters tcrit

mainly shifts the mean of beta distribution away from alpha
- contribute to increased power

22
Q

2 explanations for a non-significant result

A
  • no effect of the manipulation
  • there is an effect of manipulation, but effect no effect detected due to weak effect size, low power or bad luck
23
Q

power calculations in R

A

pwr.t.test (n=12, d=0.8, sig.level= 0.05, power=NULL, type=”two.sample”)
n: number of observations
d: effect size
sig.level: significance level (type 1 error probability)
type: type of t test (one or two)
power: power of test

24
Q

thresholds of statistical significance (alpha)

A

without a threshold, there is no type 1 or 2 error

25
Q

p hacking due to thresholds

A

any form of data manipulation in order to get results where p<0.05
- conduct multiple statistical analyses and only admit to performing the analyses that produced sig results
- deciding to remove an outlier to generate sig result
- remove an entire group to generate sig result
- select a different statistical test to generate sig result

26
Q

publication bias due to thresholds

A

less likely to be accepted to publication with entirely negative results

27
Q

how did we adopt a threshold of significance?

A

using alpha to convert a continuous probability value into a binary decision results in type 1 and 2 error, leading to p-hacking and publication bias

threshold of sig does make it easier to communicate scientific findings, especially to a lay audience

28
Q

Karl Pearson

A

founder of mathematical statistics

Pearson’s r
Pearson’s X^2 test
p value

29
Q

William Sealy Gosset

A

developed t distributions (Student’s t distribution)

statistical work was developed to improve methodologies for brewing Guinness

Company policy prevented him publishing under his own name, so adopted Student

30
Q

Ronald Fisher

A

-Developed ANOVA (F test after Fisher)
- formalized concept of null hypothesis, and stat test of H0
- formalized use of p values to evaluate H0

end point of NHST was p value
- argued against using a threshold for stat significance

31
Q

Jerzy Neyman and Egon Pearson

A

developed concept of alternative hypothesis
calculated prob of H0 being true, and prob of H1 being true
comparing two probabilities, selected which hypothesis was more likely

argued their approach was better as it evaluated two competing probabilities to identify which was most probable

fisher argued that applying binary decision would lead to confusion

32
Q

best practices in NHST

A

know how to interpret results
- sig results may be false positive esp if p= 0.05
- understand nonsig results may be false neg esp if sample size is small or effect size is small

always report effect sizes to contextualize both sig and non-sig results

plan analysis in advance to avoid p-hacking

replicate findings esp if findings are critically important to future research but are only marginally sig

apply meta analyses to research questions
- reanalyze data from multiple related publication in attempt to resolve the apparent contradictions

consider alternatives to NHST
- Bayesian stats