Chapter 3: The Phoenix of Stats Flashcards
1
Q
NHST general problems
inherent, fundamental limitations that are part of the system
A
- gives us the probability of the data given the null is true instead of the probability of the hypothesis given the data
- mismatch between inferences we want to make and what NHST gives us
- there is no way to conclude that the null is true
2
Q
NHST general misconceptions
A
- most scientists do not understand p values
- misconception 1: a significant result means that the effect is important (NO, even the most trivial effects will be statistically significant with a high enough sample size)
- misconception 2: a non-significant result means that the null hypothesis is true (NO, absence of evidence is not evidence of absence, the effect may be very small and undetectable, but still there)
- misconception 3: a significant result means that the null hypothesis is false (NO, a significant test statistic is based on probabilistic reasoning, type 1 errors)
3
Q
NHST all or nothing thinking
A
- absence of evidence is not evidence of absence
- might not find effects if you use strict p value cutoffs
4
Q
NHST as part of wider problems in science
A
- incentive structures and publication bias: jounrlas favor significant findings
- researcher degrees of freedom: decisions that researchers have to make that might impact their pubmication chances (i.e., multiple experiments, control variables, multiple dependent variables, outliers, missing data, different models, scale items)
- p hacking and harking: p hacking is running lots of different analyses and then only reporting the results that are significant. HARKing is running analyses and looking at patterns of results, then seeing they aren’t consistent with what you hypothesized a priori. so, you find another theory and/or hypothesis to support your data, even though you based it on another theory that did not support your findings initially and present it as if it weas made a priori (may be ok if you explain both)
5
Q
is most published research wrong?
A
- about 1/3 of published results will be wrong
- relationships can cross the significant threshold by adding more data points, even though a much larger sample would show that there is no relationship
- journals want novel hypotheses/studies, not replication studies
6
Q
ways to avoid NHST problems (solutions)
EMBERS
A
- effect sizes: statistical significance is not practical significance
- meta analyses: avoid all of nothing thinking
- Bayesian estimation: finding probability of hypotheses/parameter ranges
- registration: avoid phacking/harking
- sense: understanding NHST
7
Q
principles for using p values
sense
A
- p values can be useful. they help rule out sampling error and establish an effect. if it is combined with effecti size, it’s good
- we do not have to ignore decades of research that relied on p values
- we must understand what NHST is and is not
8
Q
pre-registering and open science
registration
A
- process of making science more transparent and accessible
- umbrella term for practices that make science more transparent and allow collaboration
- preregistering is the practice of making a study protocol (including data analysis strategies) public before data collection begins
9
Q
effect sizes
A
- objective and usually standardized measure of an observed effect (how big the effect is)
- magnitude of an effect
- unstandardized: mean difference, reaction time (raw units, easier to interpret)
- standardized: Cohen’s d, Pearson’s r, Odds ratio (compare across different measures becayse they’re converted to standard measures, used some measure of variability within a sample to assess the size of the effect)
10
Q
Cohen’s d
effect sizes
A
- difference between 2 means in SD units
- d = mean 1 - mean 2 / Sd
- guidelines: d ~ .2 (small), ~.5 (medium), ~ .8 (large)
- use the control group SD bc different interventions might affect the mean and SD, so it can stay more consistent. using the experimental group SD changes the metric w/ every comparison
- if the two means come from populations w/ similuar SD, then pool their SDs. this creates a higher sample size and better estimate of effect size
- helpful for practical significance
- not impacted by sample size, only makes it more accurate if its larger
11
Q
Pearson’s r
effect sizes
A
- measure of linear association between 2 variables
- ranges from -1.00 to +1.00
- guidelines: r ~ .1 (small), r ~ .3 (medium), r~ .5 (large)
12
Q
odds ratio
effect sizes
A
- measure of association between two events
- populat effect size for counts
- P(event)/P(no event)
- both events equally likely = 1, events less likely < 1, events more likely >1
13
Q
effect sizes compared to NHST
A
- effect sizes encourage interpreting effects on a continuum, rather than categorically labelling effects as significant or not
- bigger sample sizes increase the precision of the effect size estimate but do not increase the expected effect size. in other words, you cant get a large effect size by collecting a large sample, like you can get a small p value for an effect by collecting a large sample size
- the issue of researcher degrees of freedom is still present when the focus is on effect sizes, but is less of an issue because they are not tied to a decision rule (less pressure to reach an arbitrary threshold)
- significance tests should be paired with effect size measures. p values establish that there is an effect in the population, and effect size measures estimate how large that effect is
14
Q
meta analysis
A
- statistical analysis that combines findings from a lot of studies that answer the same question
- looks for the true effect
- helps us avoid all or nothing thinking that tends to occur when we focus on p values of primary studies (gives avg standardized effect size measure)
15
Q
Bayesian approaches
A
- alternative to NHST
- bayesian stats is about updating your beliefs about a parameter or hypothesis based on evidence
- P(hypothesis given the data): how often is the hypothesis true given the data is true
- the probability of the data given the hypothesis is not the same as the probability of the hypothesis given the data
- prior probability: your belief in the hypothesis before considering the data
- likelihood: probability of obtaining the data given certain hypothesis/model
- marginal likelihood: probability of the observed data (evidence)
- posterior probability: probability of the hypothesis after considering the data