Chapter 3: The Phoenix of Stats Flashcards

1
Q

NHST general problems

inherent, fundamental limitations that are part of the system

A
  • gives us the probability of the data given the null is true instead of the probability of the hypothesis given the data
  • mismatch between inferences we want to make and what NHST gives us
  • there is no way to conclude that the null is true
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

NHST general misconceptions

A
  • most scientists do not understand p values
  • misconception 1: a significant result means that the effect is important (NO, even the most trivial effects will be statistically significant with a high enough sample size)
  • misconception 2: a non-significant result means that the null hypothesis is true (NO, absence of evidence is not evidence of absence, the effect may be very small and undetectable, but still there)
  • misconception 3: a significant result means that the null hypothesis is false (NO, a significant test statistic is based on probabilistic reasoning, type 1 errors)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

NHST all or nothing thinking

A
  • absence of evidence is not evidence of absence
  • might not find effects if you use strict p value cutoffs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

NHST as part of wider problems in science

A
  1. incentive structures and publication bias: jounrlas favor significant findings
  2. researcher degrees of freedom: decisions that researchers have to make that might impact their pubmication chances (i.e., multiple experiments, control variables, multiple dependent variables, outliers, missing data, different models, scale items)
  3. p hacking and harking: p hacking is running lots of different analyses and then only reporting the results that are significant. HARKing is running analyses and looking at patterns of results, then seeing they aren’t consistent with what you hypothesized a priori. so, you find another theory and/or hypothesis to support your data, even though you based it on another theory that did not support your findings initially and present it as if it weas made a priori (may be ok if you explain both)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

is most published research wrong?

A
  • about 1/3 of published results will be wrong
  • relationships can cross the significant threshold by adding more data points, even though a much larger sample would show that there is no relationship
  • journals want novel hypotheses/studies, not replication studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ways to avoid NHST problems (solutions)

EMBERS

A
  1. effect sizes: statistical significance is not practical significance
  2. meta analyses: avoid all of nothing thinking
  3. Bayesian estimation: finding probability of hypotheses/parameter ranges
  4. registration: avoid phacking/harking
  5. sense: understanding NHST
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

principles for using p values

sense

A
  • p values can be useful. they help rule out sampling error and establish an effect. if it is combined with effecti size, it’s good
  • we do not have to ignore decades of research that relied on p values
  • we must understand what NHST is and is not
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

pre-registering and open science

registration

A
  • process of making science more transparent and accessible
  • umbrella term for practices that make science more transparent and allow collaboration
  • preregistering is the practice of making a study protocol (including data analysis strategies) public before data collection begins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

effect sizes

A
  • objective and usually standardized measure of an observed effect (how big the effect is)
  • magnitude of an effect
  • unstandardized: mean difference, reaction time (raw units, easier to interpret)
  • standardized: Cohen’s d, Pearson’s r, Odds ratio (compare across different measures becayse they’re converted to standard measures, used some measure of variability within a sample to assess the size of the effect)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cohen’s d

effect sizes

A
  • difference between 2 means in SD units
  • d = mean 1 - mean 2 / Sd
  • guidelines: d ~ .2 (small), ~.5 (medium), ~ .8 (large)
  • use the control group SD bc different interventions might affect the mean and SD, so it can stay more consistent. using the experimental group SD changes the metric w/ every comparison
  • if the two means come from populations w/ similuar SD, then pool their SDs. this creates a higher sample size and better estimate of effect size
  • helpful for practical significance
  • not impacted by sample size, only makes it more accurate if its larger
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pearson’s r

effect sizes

A
  • measure of linear association between 2 variables
  • ranges from -1.00 to +1.00
  • guidelines: r ~ .1 (small), r ~ .3 (medium), r~ .5 (large)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

odds ratio

effect sizes

A
  • measure of association between two events
  • populat effect size for counts
  • P(event)/P(no event)
  • both events equally likely = 1, events less likely < 1, events more likely >1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

effect sizes compared to NHST

A
  • effect sizes encourage interpreting effects on a continuum, rather than categorically labelling effects as significant or not
  • bigger sample sizes increase the precision of the effect size estimate but do not increase the expected effect size. in other words, you cant get a large effect size by collecting a large sample, like you can get a small p value for an effect by collecting a large sample size
  • the issue of researcher degrees of freedom is still present when the focus is on effect sizes, but is less of an issue because they are not tied to a decision rule (less pressure to reach an arbitrary threshold)
  • significance tests should be paired with effect size measures. p values establish that there is an effect in the population, and effect size measures estimate how large that effect is
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

meta analysis

A
  • statistical analysis that combines findings from a lot of studies that answer the same question
  • looks for the true effect
  • helps us avoid all or nothing thinking that tends to occur when we focus on p values of primary studies (gives avg standardized effect size measure)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bayesian approaches

A
  • alternative to NHST
  • bayesian stats is about updating your beliefs about a parameter or hypothesis based on evidence
  • P(hypothesis given the data): how often is the hypothesis true given the data is true
  • the probability of the data given the hypothesis is not the same as the probability of the hypothesis given the data
  • prior probability: your belief in the hypothesis before considering the data
  • likelihood: probability of obtaining the data given certain hypothesis/model
  • marginal likelihood: probability of the observed data (evidence)
  • posterior probability: probability of the hypothesis after considering the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bayesian priors of parameters

A

taking a confident, informed prior distribution (less affected by data) and an unconfident, uninformed prior distribution (more affected by data), then adding in data, gives us = posterior distribution
- posterior is a better estimate. it is a credibility interval of where parameters are
- interpreted as “95% probability that the parameter is between X and Y”
- a credibility interval is an interval estimate of a parameter. unlike CIs, credibility intervals can be interpreted with probability statements
- when strongly informative priors are used, the data will influence that posterior probabilities less than when weakly informative priors are used

17
Q

posterior odds

A
  • used to compare two competing hypotheses
  • P (hypothesis 1 given data) / P (hypothesis 2 given data)
18
Q

bayes factor

A
  • used to indicate the degree to which beliefs change after considering the evidence. indicates the degree to which the data supports either the null or alt.
  • P(data given the alternative) / P (data given the null)
  • Bayes factor = 1: data is equally likely under both hypotheses
  • Bayes factor > 1 favors the alternative hypothesis
  • Bayes factor < 1 favors the null hypothesis

guidelines: 1-3 = evidence for the alt is barely worth mentioning. 3-10 = evidence for alt has substance. >10 = strong evidence for the alt

19
Q

benefits of bayesian approaches

A
  • matches with the inferences we want to make (probability that the alt or null are true)
  • you can keep gathering more data and updating your beliefs
  • focus is properly on estimation and interpretation instead of B&w thinking (reject/accept), reduces p hacking

downside: priors are subjective