Chapter 3: The Phoenix of Stats Flashcards

Question 1

Q

NHST general problems

inherent, fundamental limitations that are part of the system

Answer

A

gives us the probability of the data given the null is true instead of the probability of the hypothesis given the data
mismatch between inferences we want to make and what NHST gives us
there is no way to conclude that the null is true

Question 2

Q

NHST general misconceptions

Answer

A

most scientists do not understand p values
misconception 1: a significant result means that the effect is important (NO, even the most trivial effects will be statistically significant with a high enough sample size)
misconception 2: a non-significant result means that the null hypothesis is true (NO, absence of evidence is not evidence of absence, the effect may be very small and undetectable, but still there)
misconception 3: a significant result means that the null hypothesis is false (NO, a significant test statistic is based on probabilistic reasoning, type 1 errors)

Question 3

Q

NHST all or nothing thinking

Answer

A

absence of evidence is not evidence of absence
might not find effects if you use strict p value cutoffs

Question 4

Q

NHST as part of wider problems in science

Answer

A

incentive structures and publication bias: jounrlas favor significant findings
researcher degrees of freedom: decisions that researchers have to make that might impact their pubmication chances (i.e., multiple experiments, control variables, multiple dependent variables, outliers, missing data, different models, scale items)
p hacking and harking: p hacking is running lots of different analyses and then only reporting the results that are significant. HARKing is running analyses and looking at patterns of results, then seeing they aren’t consistent with what you hypothesized a priori. so, you find another theory and/or hypothesis to support your data, even though you based it on another theory that did not support your findings initially and present it as if it weas made a priori (may be ok if you explain both)

Question 5

Q

is most published research wrong?

Answer

A

about 1/3 of published results will be wrong
relationships can cross the significant threshold by adding more data points, even though a much larger sample would show that there is no relationship
journals want novel hypotheses/studies, not replication studies

Question 6

Q

ways to avoid NHST problems (solutions)

EMBERS

Answer

A

effect sizes: statistical significance is not practical significance
meta analyses: avoid all of nothing thinking
Bayesian estimation: finding probability of hypotheses/parameter ranges
registration: avoid phacking/harking
sense: understanding NHST

Question 7

Q

principles for using p values

sense

Answer

A

p values can be useful. they help rule out sampling error and establish an effect. if it is combined with effecti size, it’s good
we do not have to ignore decades of research that relied on p values
we must understand what NHST is and is not

Question 8

Q

pre-registering and open science

registration

Answer

A

process of making science more transparent and accessible
umbrella term for practices that make science more transparent and allow collaboration
preregistering is the practice of making a study protocol (including data analysis strategies) public before data collection begins

Question 9

Q

effect sizes

Answer

A

objective and usually standardized measure of an observed effect (how big the effect is)
magnitude of an effect
unstandardized: mean difference, reaction time (raw units, easier to interpret)
standardized: Cohen’s d, Pearson’s r, Odds ratio (compare across different measures becayse they’re converted to standard measures, used some measure of variability within a sample to assess the size of the effect)

Question 10

Q

Cohen’s d

effect sizes

Answer

A

difference between 2 means in SD units
d = mean 1 - mean 2 / Sd
guidelines: d ~ .2 (small), ~.5 (medium), ~ .8 (large)
use the control group SD bc different interventions might affect the mean and SD, so it can stay more consistent. using the experimental group SD changes the metric w/ every comparison
if the two means come from populations w/ similuar SD, then pool their SDs. this creates a higher sample size and better estimate of effect size
helpful for practical significance
not impacted by sample size, only makes it more accurate if its larger

Question 11

Q

Pearson’s r

effect sizes

Answer

A

measure of linear association between 2 variables
ranges from -1.00 to +1.00
guidelines: r ~ .1 (small), r ~ .3 (medium), r~ .5 (large)

Question 12

Q

odds ratio

effect sizes

Answer

A

measure of association between two events
populat effect size for counts
P(event)/P(no event)
both events equally likely = 1, events less likely < 1, events more likely >1

Question 13

Q

effect sizes compared to NHST

Answer

A

effect sizes encourage interpreting effects on a continuum, rather than categorically labelling effects as significant or not
bigger sample sizes increase the precision of the effect size estimate but do not increase the expected effect size. in other words, you cant get a large effect size by collecting a large sample, like you can get a small p value for an effect by collecting a large sample size
the issue of researcher degrees of freedom is still present when the focus is on effect sizes, but is less of an issue because they are not tied to a decision rule (less pressure to reach an arbitrary threshold)
significance tests should be paired with effect size measures. p values establish that there is an effect in the population, and effect size measures estimate how large that effect is

Question 14

Q

meta analysis

Answer

A

statistical analysis that combines findings from a lot of studies that answer the same question
looks for the true effect
helps us avoid all or nothing thinking that tends to occur when we focus on p values of primary studies (gives avg standardized effect size measure)

Question 15

Q

Bayesian approaches

Answer

A

alternative to NHST
bayesian stats is about updating your beliefs about a parameter or hypothesis based on evidence
P(hypothesis given the data): how often is the hypothesis true given the data is true
the probability of the data given the hypothesis is not the same as the probability of the hypothesis given the data
prior probability: your belief in the hypothesis before considering the data
likelihood: probability of obtaining the data given certain hypothesis/model
marginal likelihood: probability of the observed data (evidence)
posterior probability: probability of the hypothesis after considering the data

Question 16

Q

Bayesian priors of parameters

Answer

Study These Flashcards

A

taking a confident, informed prior distribution (less affected by data) and an unconfident, uninformed prior distribution (more affected by data), then adding in data, gives us = posterior distribution
- posterior is a better estimate. it is a credibility interval of where parameters are
- interpreted as “95% probability that the parameter is between X and Y”
- a credibility interval is an interval estimate of a parameter. unlike CIs, credibility intervals can be interpreted with probability statements
- when strongly informative priors are used, the data will influence that posterior probabilities less than when weakly informative priors are used

Question 17

Q

posterior odds

Answer

Study These Flashcards

A

used to compare two competing hypotheses
P (hypothesis 1 given data) / P (hypothesis 2 given data)

Question 18

Q

bayes factor

Answer

Study These Flashcards

A

used to indicate the degree to which beliefs change after considering the evidence. indicates the degree to which the data supports either the null or alt.
P(data given the alternative) / P (data given the null)
Bayes factor = 1: data is equally likely under both hypotheses
Bayes factor > 1 favors the alternative hypothesis
Bayes factor < 1 favors the null hypothesis

guidelines: 1-3 = evidence for the alt is barely worth mentioning. 3-10 = evidence for alt has substance. >10 = strong evidence for the alt

Question 19

Q

benefits of bayesian approaches

Answer

Study These Flashcards

A

matches with the inferences we want to make (probability that the alt or null are true)
you can keep gathering more data and updating your beliefs
focus is properly on estimation and interpretation instead of B&w thinking (reject/accept), reduces p hacking

downside: priors are subjective

Chapter 3: The Phoenix of Stats Flashcards

(19 cards)