Midterms material Flashcards
What are the null hypothesis and alternative hypothesis for a one-way ANOVA?
H0 :μ1 =μ2 =μ3
H1 : Not all μ’s are the same
What’s a factor in a one-way ANOVA?
the independent variable
What are the levels in a one-way ANOVA?
The different groups/treatment and control conditions
What are the assumptions of a one-way ANOVA?
- The population distribution of the DV is normal within each group
- The variance of the population distributions are equal for each group (homogeneity of variance assumption)
- Independence of observations
What’s the familywise Type 1 error rate?
The probability of making at least one Type 1 error in the family of tests if the null hypotheses are true
What’s a family of tests?
a set of related hypotheses
What does the Overall F-test or first test of ANOVA tell us?
- Overall F-test evaluates is H0 false?
- If the overall F-test is significant then we use post-hoc tests to look at pairs of groups
What kind of ratio does ANOVA give us?
- F ratio
- ANOVA gives us a ratio of variance due to group membership over variance that is not explained by group membership (MSm divided by MSr)
What is variance explained by the model (MSm)?
Between-group variance that is due to the IV, or different treatments/levels of a factor -> variance accounted for by group membership
What is residual variance (MSr)?
- Within-group variance that can’t be accounted for by group membership
- Within each group, there is some random variation in the scores for the subjects
How are the F statistic and degrees of freedom presented?
F (dfM, dfR) = x
What kind of distribution is the F distribution?
A right-skewed distribution used most commonly in ANOVA
When can you reject the null hypothesis in an ANOVA test?
If your F value is greater than or equal to the critical value, you may reject the null hypothesis
How does the F ratio relate to the t statistic?
- With only two groups, either a t test or an F test can be used for testing for a significant difference between means
- Both procedures lead to the same conclusion
- When the number of groups is 2, then F = t^2
In ANOVA formula, what does X-bar stand for?
The grand mean (across all observations)
In ANOVA formula, what does i stand for?
An observation (coming from N total observations)
In ANOVA formula, what does g stand for?
A group
In ANOVA formula, what does k stand for?
Total number of groups
In ANOVA formula, what does Ng stand for?
Size of group g
In ANOVA formula, what does Xbar-g stand for?
Group mean
In ANOVA formula, what does Xig stand for?
Xig - observation i in group g
What does SSt stand for?
The aggregate variation/dispersion of individual observations across groups
What are MST , MSM , and MSR often called?
the total, model (between-group), and residual (within-group) Mean Squares, respectively
Which effect size is more commonly reported in ANOVA?
η2 (eta squared)
What do the effect sizes (pearson R, eta squared and omega squared) all look for?
Proportion of variance in the DV that is explained by the IVs
What’s the difference between
eta squared and omega squared?
- η2 is positively biased (overestimates the amount of variance explained in the DV by the IVs)
- ω2 is unbiased
What are the cut-offs for the effect size of
omega squared?
- Small ≈ .01
- Medium ≈ .06
- Large ≈ .14
- Report ω2, even if it’s negative
What does fully-crossed mean in a factorial design?
That the factor levels are multiplied by each other (ex: factor 1 has 3 levels and factor 2 has 3 levels then it’s a 3x3 factorial design with 9 treatment conditions)
What elements should be included in the APA style analysis conclusion (in order)?
- 1-2 sentence overview of analyses that includes the independent and dependent variable, stated conceptually.
- Description of overall results of F -test, in a particular format, including effect size measure
- Description of the pattern of mean differences among groups, including whether significant differences were found (M for mean and SD for standard dev) -> when working with 3 groups ANOVA test, we’ll have to conduct post-hoc tests to evaluate which pairs of groups have significant mean differences
- A conceptual conclusion
Provide an example of what elements should be included in the APA style analysis conclusion (in order)?
- To investigate whether level of fitness (low versus high) had an effect on ego strength (with higher scores indicating more ego strength), we conducted a one-way between-subjects ANOVA
- This analysis revealed a significant effect of fitness on ego strength,
F (1, 8) = 5.32, p < .05, ω2 = .61 - Participants in the low fitness group (M = 4.40, SD = 0.92) had significantly lower ego strength than those in the high fitness group (M = 6.36, SD = 0.55)
- We conclude that having high as opposed to low fitness may increase ego strength
How to report numbers in APA format?
- 2 decimal places
- 3 decimal places for p-values
True or False: with two groups the results of an independent samples t-test and a between-subjects ANOVA on the same data set will always agree
FALSE: they could disagree they use a different value of α
What are assumptions of a single mean z-test?
- The variable, X, in the population is normally distributed
- The sample must be a simple random sample of the population (independence of observations)
- The population standard deviation, σ, must be known
What are the effect size cut-offs for r?
0.10 -> small effect
0.30 -> medium effect
0.50 -> large effect
What does a 95% Confidence interval mean?
If we repeated our experiment many times, 95% of the time a 95% CI will contain the true effect
What does the p-value represent?
The p-value represents the proportion of data sets that would yield a result as extreme or more extreme than the observed result if H0 is true
What are the effect size cut-offs for r squared?
0.01 -> small
0.09 -> medium
0.25 -> large
What are the effect size cut-offs for cohen’s d?
0.2 -> small
0.5 -> medium
0.8 -> large
What are the assumptions in between subjects ANOVA?
- Independence of observations
- Identical distribution (within group)
- Identical distribution (between groups)
- Homogeneity of variance
- Normal Distribution
Describe the formula Yij =μ+αj +Eij
- Formula describing the linear model underlying everything we do in ANOVA
- Yij = person i’s score on the outcome Y and this person i belongs in group j -> Y is the dependant variable
- Eij -> experimental error - something that allows individual scores of people in that population to vary from this group mean (assumed to be normal)
- Eij is random, but mu + alpha-j is fixed for every member of that population
- In this equation, mu + alpha-j is constant for every person in the population (one population = one mean)
The assumptions about normality and equal variances are assumptions about what?
- The population
- Usually we can examine the sample for evidence about whether these assumptions hold
What are some methods for Assessing Normality?
Descriptive and Inferential Statistics:
- Looking at the mean, median, mode
- Tests for skewness (testing whether skewness is significant -> normal distribution has skew of 0, any type of skewness means that the distribution isn’t perfectly normal)
- Kolmogorov-Smirnov and Shapiro-Wilk tests
Visual methods:
- Histograms
- Normal Quantile (Q-Q) Plot
Describe tests for skewness when assessing normality
- Skewness represents symmetry and whether the distribution has a long tail in one direction
- Left (negative) skew = Mean < Median
- Symmetric (normal) = Mean = Median
- Right (positive) skew = Median < Mean
- Skewness should be ~0
> 0 - positive/right skew (longer right-hand tail)
< 0 - negative/left skew (longer left-hand tail) - Also look at standard errors (SE skewness)
- Conducting a significance test for whether skewness is significantly different from 0
- To compute this, we will get an estimate of skewness of our variable, divided by the standard error, and then compare this against a value of 3.2 in absolute value
- Reject the null hypothesis that skew is 0 in the population if the ratio tskewness is greater than 3.2 in absolute value
- Here we don’t want to reject the null hypothesis because rejecting it would mean we have found evidence that our scores aren’t normally distributed
What’s the more unbiased estimate of central tendency?
Median, rather than the mean
What are the statistical tests of normality?
- The Kolmogorov-Smirnov (K-S) test
- The Shapiro-Wilk (S-W) test
- If a test is significant, reject the null hypothesis that the distribution of the variable is normal
What’s the Kolmogorov-Smirnov (K-S) test?
- Very general, but usually less power than Shapiro-Wilk (S-W) test
- Conceptually, compares sample scores to a set of scores generated from e.g., a normal distribution with the sample mean and standard deviation
- Used to see if the scores on your variable follow any distribution you think they follow
- Conceptually, this test takes your observed scores on the variable and it compares them to quantiles from this reference distribution you’re trying to assess whether it’s appropriate for your data
- If there are large departures from the quantiles from the reference distribution and your observed scores -> this would be evidence against your scores following the distribution you think they follow
What’s the Shapiro-Wilk (S-W) test?
- Usually more powerful, but only for normal distributions
- Follows a similar logic to the Kolmogorov-Smirnov (K-S) test
What are limitations of the normality tests and solutions to overcome these?
- It’s easy to find significant results (reject null hypothesis that data is normal) when sample size is large
- Same with skewness tests -> as the sample size gets larger, SE gets smaller and with smaller SE, you’re more likely to get a t ratio value larger than 3.2, even with small values of skewness
- Solution: do the tests, but plot data as well and examine the histogram for evidence of multimodality, extreme scores (outliers), and asymmetry
- More than one mode is evidence of deviation from normality
Describe the use of histograms to assess normality
- Create separate histograms for each group to assess normality
- Look for obvious signs of non-normality
- Doesn’t have to be perfect, just roughly symmetric
- Multiple modes may suggest that there are different subpopulations in the sample
- If that’s the case, include a classification variable as an additional factor in the ANOVA
Describe the use of normal quantile plot (or normal probability plot or Normal Q-Q plot) to assess normality
- Compute percentile rank for each score
- Sort observations from smallest to largest
- What percentage of scores are below score X? - Calculate (theoretical or expected) z-scores from percentile rank
- If the scores were normal, what would the z-score be?
3 Calculate actual z-scores
4 Plot the observed vs. theoretical z-scores
- We get some percentiles from the z-distribution and we see how much our observed z-scores deviate from the percentiles from the normal distribution
- If the data are close to normal, then the points will like close to a straight line
What do violations of the assumption of normality lead to?
- Non-normality tends to produce Type I error rates that are lower than the nominal value
- Depending on the context of the research study, this may be less concerning than an assumption violation that results in excessive Type I error rates (above the nominal value α)
- When we select an alpha of say .05, we’re saying that if the null hypothesis is true, 5% of our findings in the long run will be false positives
- If you don’t meet the assumption of normality and you pick an alpha level of .05 -> less than 5% of your results in the long run will be false positives if the null hypothesis is true
- This means you have lower power to detect differences if there is an effect in the population
- A consequence of the violation of the assumption of normality is that you might miss some effects (not inflating type 1 error rate but you are decreasing your power)
Type 1 error rate and what go hand in hand?
Type 1 error rate and power go hand in hand (as one increases so does the other)
What’s the assumption of homogeneity of variance?
Assuming that all of the group variances are equal
What does violation of the assumption of homogeneity of variance lead to?
- Serious violation of this assumption tends to inflate the observed value of the F statistic
- Too many rejections of H0 = high Type I error
- This is a more problematic assumption because if you violate this assumption, you will inflate your type 1 error rates
- If you select an alpha of .05, but your assumption of homogeneity of variance is not met, you may end up with more than 5% of false positives if the null hypothesis is true