Final Flashcards
ANOVA
analysis of variance, measures differences in sample means accross 2 or more groups
In ANOVA if H0 is false, there should be…
…a substatnial difference between categories between categories but not within
F ratio =
mean square between / mean square within
F ratio is bigger when…
categories are more distinct and tightly clustered
[ . [ ] . ]
[ . ]
F ratio = smaller/larger = smaller
[ . ] [ . ]
[ . ]
F ratio = larger/smaller = larger
The assumptions of a ANOVA test
independent random samples, interval/ratio measurement, normal distribution, population variances are equal
Limitations of ANOVA
- requires interval/ratio dependent , nominal independent
- just bc its significant doesn’t been its substantive
- the alternate hypothesis is not specific
the alternate hypothesis of ANOVA test
At least one of the population means differs from the others
synonyms of mean square between/mean square within
sum of square between/degrees of freedom between
sum of square within/degrees of freedom within
Is ANOVA one-tailed or two-tailed?
one-tailed
the main question of ANOVA
is there more variance between categories or within?
strengths of chi-square test
allows use of nominal (and ordinal) variables for dependent instead of just interval/ratio like ANOVA
in a bivariate table, is the independent variable in the columns or rows?
independent = columns, dependent = rows
chi-square test
a test of independence/significance based on bivariate, crosstabulation tables.
the H0 of chi-square test
the variables are completely random and independent, Fo = Fe
what does Fe stand for and how is it calculated?
expected frequencies = row marginal x column marginal/n
assumptions of chi-square test
independent random sample, nominal level of measurement, no assumption of sampling distribution
why is there no assumption of sampling distribution in chi-square test?
bc chi-square test is non-parametric, i.e. it does not deal with distribution patterns
degrees of freedom in chi-square
(rows - 1)(columns - 1)
degrees of freedom is like Sudoku –> how many cells can be missing while still being able to figure out all of the blanks
limitations of chi-square
- tells us that categories are independent, but it doesn’t tell us about patterns/nature of the relationship
- difficult to interpret when variables have many categories
- with a small sample size, it cannot be assumed that chi-square sampling distribution is accurate
- very sensitive to sample size
how does chi-square test react to large sample sizes?
as the sample size increases, chi-square obtained increases. With large samples, trivial relationships may be significant (i.e. things can be erroneously said to be significantly different)
three questions of bivariate association
(1) does an association exist?
(2) how strong is the association?
(3) what is the pattern/direction of association?
when do we want to use Lambda?
for nominal variables with large sample sizes that can’t be properly assessed with chi-square.
PRE measures
Proportional Reduction in Error
1st prediction: ignore information about the independent variable and make many errors E in predicting the value of the dependent variable
2nd prediction: take into account information about the independent variable in predicting the value of the dependent. If variables are associated, we should make fewer errors
is Lambda PRE?
yes
lambda = (E1 - E2)/E1
interpreting Lambda statistic
e.g. lambda = .33 means that the ability to predict something increased by 33%. in other words, the likelihood of making a mistake is reduced by 66%
0.00–0.10 = weak
.011–0.30 = moderate
0.31–1.00 = strong
Limitations of Lambda
- asymmetric (value will vary dependening on which variable is independent, so care is needed in designating independent variable)
- when row totals are very unequal, Lambda can be zero even when there is an association between the variables