Chapter 13 - Inferential Statistics Flashcards
inferential stats
- Use data collected on a sample to infer what is happening in the population
- Is the effect we found in our sample due to random chance, or due to a true effect in the population?
why are we often wrong about inferential stats?
- We focus on random (rare) events
- Fail to use probabilistic information when making judgments
- Focus on specific rather than general information (Eg. base rate fallacy/”person-who” statistics - we know that smoking is bad, but someone will say “well my grandpa smokes a pack a day and he’s still healthy”)
- See patterns in randomness (Ex. Seeing pattern between mass shootings and violent video games -> although the shooters had a bunch of things in common (ie. Both brush teeth), we pay attention to commonality that we feel makes a pattern)
random sample
- Taking random sample from population to estimate true effect
- As sample size decreases, estimate is less accurate of what really happens in the population
- Is the sample representative?
- samples are imperfect assessments of overall probabilities
- ex. In any fun-sized packet of Smarties, how many of each colour are there?
population level
- ex. in the Smarties factory, how many of each colour are made?
probabilistic trend
- What is the true overall effect?
- Not likely to be reflected in every sample and case
- The smaller the sample, the less likely you’ll see a probabilistic trend that reflects the overall effect
- ex. The expected proportion of each colour of Smarties in the population of Smarties at the factory
large vs. small samples
- Small samples subject to more error in estimating population value
- Even with random assignment, problem still remains
- Random assignment works best with large sample sizes (chance plays a major role in statistical analysis and research methods)
random sampling vs. random assignment
- Random sampling: when everyone in the population has an equal chance of being chosen as a participant (if you don’t do random sampling, bias can be introduced)
- Random assignment: when all participants have equal chance of being chosen to be in experiment or control condition
directional null hypothesis
- assumes there is no effect on population and any difference is due to error -> has to account for everything your research hypothesis doesn’t account for
- Scientific notation: H0
- ex. Mean 1 is less than or equal to mean 2
- True effect: California label will not make wine taste better in population
- Says that random chance caused Mean 1 > Mean 2 in our sample
directional research hypothesis
- assumes that the means are not equal in the population
- ex. Mean 1 > mean 2
- Scientific notation: H1 or HA
- True effect: California label makes wine taste better
- Random chance is very unlikely explanation that mean 1 > mean 2 in our sample
non-directional research hypothesis
- assumes that the means are not equal in the population
- ex. Mean 1 doesn’t equal mean 2
- Scientific notation: H1 or HA
- True effect: label affects taste perception in the population
- Random chance is a very unlikely explanation of the difference between groups in our sample
non-directional null hypothesis
- assumes there is no effect on population and any difference is due to error -> has to account for everything your research hypothesis doesn’t account for
- ex. Mean 1 equals mean 2
- Scientific notation: H0
- True effect: label doesn’t affect taste perception in the population
- Random chance caused any difference between groups in our sample
when analyzing data, start by assuming that the ____ is true
- null hypothesis (meaning your results are due to random chance)
- as you go, figure out if you can reject it
- Whether something is due to chance is always the most parsimonious alternative explanation for any research finding
basics of sampling distribution
- aka: binomial distribution or null hypothesis sampling distribution
- distribution of probability that a statistic would emerge in pop based on many previous samples -> what result would you get from data if null hypothesis is true and results are only due to random chance?
- Helps establish critical values -> threshold for determining significance
- ex. If you claim that you have magical powers to sense and pull out purple smarties from a box, what you need to know first is how many purple smarties other people would randomly pull? -> sampling distribution
- from there, you can use your sampling distribution to establish how many purple smarties one would have to pull out of a box to be magical -> critical value
- ex. if people randomly pick 4 purple smarties on average, you might say your critical value is 8 smarties
how to calculate statistical significance
- Calculate a statistic that captures the effect (eg. chi square, t or f value, correlation, etc.) -> aka find obtained t value
- Refer to sampling distribution for comparison (what’s the expected statistic value for this sample size?) -> aka find critical t value
- Decide if our statistic value is rare enough to consider it significant -> if yes, reject null hypothesis and publish, if not, retain null hypothesis
degrees of freedom (df)
- used to locate appropriate sampling distribution
- formula: N – 2 (total sample minus number of groups)
- ex. If N = 6, df = 4
- df is dependent on sample size, and more = better
t-test
- allows us to determine whether mean difference between 2 groups is significant
- mean/group difference (difference between means for each group) divided by error (within-group variability) = t
- ex. 2/4 = 0.5 = t
- the bigger the t-value, the better
- there is a different sampling distribution of t for each sample size
- If groups are farther apart (larger effect size), the size of the t-statistic will increase
how to locate appropriate tcrit?
- alpha level (how likely are we to incorrectly reject the null hypothesis?)
- If |tobt| > |trcrit|, then we reject the null hypothesis
- Alpha is usually set to 0.05
Two-tailed vs. one-tailed tests
- 2 tailed: look at alpha level (usually 5%) on either the left or right side of distribution -> if tobt falls outside of that 5%, it’s considered rare enough
- 1 tailed: look at alpha level on both sides of distribution (2.5% + 2.5%) -> if tobt falls between these values, it’s considered rare enough
how do we reduce error so we get larger t-values?
- avoiding poorly worded questions (eg. Double-barreled question, double-negative)
- Minimizing effect of uncontrolled variables (eg. Environmental variables, distractors)
- Having a larger sample size
- Between- vs. Within-subjects design
t-obtained vs. f-obtained (ANOVA)
- both use interval or ratio DV and nominal IV
- both calculate a signal to noise ratio
- DIFFERENCE: t-test can compare between 2 levels of IV, while f-test can compare between MORE than 2 levels of IV
- One condition where f and t are directly related to each other: when you have two groups in your study (In this case: F = t^2)
if tobt > tcrit, ___ null hypothesis
reject null hypothesis (your results aren’t likely due to chance!)
if tobt < tcrit, ___ null hypothesis
retain null hypothesis (your results are likely due to chance)
what does statistically significant mean?
results aren’t likely to be due to chance (unlikely that difference between 2 groups has a t of 0)
errors in judgment
- Type 1 error: H0 is true in population, but you reject H0 (Similar to a false positive)
- Type 2 error: H0 is not true, but you retain it (Similar to a false negative)
- Research community takes Type 1 errors more seriously than Type 2 errors due to publication bias
- However Type 2 errors are also problematic as unreported “file drawer” studies with the type 2 error reduce long-term accuracy of the field by being excluded from meta-analyses
error rates
- type 1 error rate: alpha
- type 2 error rate: beta
- correct decision to retain true null hypothesis: 1-alpha
- correct decision to reject untrue null hypothesis: 1-beta (aka: POWER)
power
- Probability of making the correct decision to reject a false null hypothesis (ability to detect an effect if one truly exists)
- power = 1-p(type 2 error) where p(type 2 error) means “probability of making a type 2 error”
how does effect size influence type 1 and type 2 error?
- As effect size increases, type 1 error doesn’t change -> it only changes if you change whatever the alpha value is that you set
- As effect size increases, type 2 error rate decreases because you have higher power
p-hacking
- A set of ethically questionable practices researchers to get significant results
- ex. Data peeking: Run 20 participants, look for significance. No significance? Run 10 more. Significance? Stop and publish. -> Capitalizing on chance
- ex. Tossing out participants’ data that disagree with your hypothesis
- ex. Selective reporting: use multiple measures of the same construct; only 1 measure shows significance; you only report the results of the one that shows significance
- ex. No significance? Find, and report, random interactions as though you predicted them, whether you predicted them or not
statistical significance
- are differences between groups due to random error, or do they represent a real effect?
- there will almost always be a difference between your groups, but it may just be caused by random error -> that’s why statistical significance is important
- if something is statistically significant, it’s unlikely that the results are due to random error
F-test
- test of statistical significance
- ratio: systematic variance / error variance
- systematic variance: score of all participants across conditions
- error variance: how much individual scores in each group deviate from their respective group means
- the larger the F ratio, the more likely that the difference between groups is statistically significant
confidence intervals
- interval of values that will capture the true population value a certain proportion of the time (ex. 95%)
- ex. 95% confidence interval indicates that 95% of confidence intervals we calculate based on samples of that size will capture the population value
- if the confidence levels for 2 means don’t overlap, this is a clue that the difference is statistically significant
- as sample size increases, confidence interval narrows and estimates become more precise
conclusion validity
the extent to which conclusions about the relationships among variables reached are correct or reasonable
3 factors that power (and type 2 error rates) rely on
- Sample size: Greater the sample size, greater the power, less error in data to detect effect
- Magnitude of effect size: The larger the difference is in the population, the easier it is to detect, thus greater power
- Alpha level: The larger the alpha level, this easier it is to find the data consistent with research hypothesis (reject null hypothesis), thus greater power