Final Flashcards
ANOVA
analysis of variance, measures differences in sample means accross 2 or more groups
In ANOVA if H0 is false, there should be…
…a substatnial difference between categories between categories but not within
F ratio =
mean square between / mean square within
F ratio is bigger when…
categories are more distinct and tightly clustered
[ . [ ] . ]
[ . ]
F ratio = smaller/larger = smaller
[ . ] [ . ]
[ . ]
F ratio = larger/smaller = larger
The assumptions of a ANOVA test
independent random samples, interval/ratio measurement, normal distribution, population variances are equal
Limitations of ANOVA
- requires interval/ratio dependent , nominal independent
- just bc its significant doesn’t been its substantive
- the alternate hypothesis is not specific
the alternate hypothesis of ANOVA test
At least one of the population means differs from the others
synonyms of mean square between/mean square within
sum of square between/degrees of freedom between
sum of square within/degrees of freedom within
Is ANOVA one-tailed or two-tailed?
one-tailed
the main question of ANOVA
is there more variance between categories or within?
strengths of chi-square test
allows use of nominal (and ordinal) variables for dependent instead of just interval/ratio like ANOVA
in a bivariate table, is the independent variable in the columns or rows?
independent = columns, dependent = rows
chi-square test
a test of independence/significance based on bivariate, crosstabulation tables.
the H0 of chi-square test
the variables are completely random and independent, Fo = Fe
what does Fe stand for and how is it calculated?
expected frequencies = row marginal x column marginal/n
assumptions of chi-square test
independent random sample, nominal level of measurement, no assumption of sampling distribution
why is there no assumption of sampling distribution in chi-square test?
bc chi-square test is non-parametric, i.e. it does not deal with distribution patterns
degrees of freedom in chi-square
(rows - 1)(columns - 1)
degrees of freedom is like Sudoku –> how many cells can be missing while still being able to figure out all of the blanks
limitations of chi-square
- tells us that categories are independent, but it doesn’t tell us about patterns/nature of the relationship
- difficult to interpret when variables have many categories
- with a small sample size, it cannot be assumed that chi-square sampling distribution is accurate
- very sensitive to sample size
how does chi-square test react to large sample sizes?
as the sample size increases, chi-square obtained increases. With large samples, trivial relationships may be significant (i.e. things can be erroneously said to be significantly different)
three questions of bivariate association
(1) does an association exist?
(2) how strong is the association?
(3) what is the pattern/direction of association?
when do we want to use Lambda?
for nominal variables with large sample sizes that can’t be properly assessed with chi-square.
PRE measures
Proportional Reduction in Error
1st prediction: ignore information about the independent variable and make many errors E in predicting the value of the dependent variable
2nd prediction: take into account information about the independent variable in predicting the value of the dependent. If variables are associated, we should make fewer errors
is Lambda PRE?
yes
lambda = (E1 - E2)/E1
interpreting Lambda statistic
e.g. lambda = .33 means that the ability to predict something increased by 33%. in other words, the likelihood of making a mistake is reduced by 66%
0.00–0.10 = weak
.011–0.30 = moderate
0.31–1.00 = strong
Limitations of Lambda
- asymmetric (value will vary dependening on which variable is independent, so care is needed in designating independent variable)
- when row totals are very unequal, Lambda can be zero even when there is an association between the variables
when row marginals are very unequal, what test should be used?
chi-square
analyzing association between variables at ordinal elvel
to detect association within bivariate, use ch-square, then use Somer’s d to detect the strength of the association (uses same scale as Lambda to determine strength)
scattergrams
display relationships between two interval/ratio variables
describe the axes of a scattergram
X = independent, y = vertical
regression line
aka line of best fit, a line that gets as close to all cases as possible.
assessing the strength of regression lines
clustering around the lines indicates strength of the linear relationship between two variables
formula of regression line
Y = a +bX where, Y = score on the dependent variable a = the Y intercept b = the slope i.e. amount change produced in Y by unit change in X X = score on the independent variable
Pearson’s r
measure of association for two interval-ratio variables
0.00–0.10 = weak
.011–0.30 = moderate
0.31–1.00 = strong
r squared
aka coefficient of determination, provides PRE interpretation
multivariate regression
looks at the part of y that x can explain that z can’t explain i.e. the effect of x on y while controlling for z
formula for multivariate regression line
y = z + (b1)(X1) + (b2)(x2)
the partial slope controls for the other relationship
the ANOVA test is designed for independent variables measured at the ____ level.
nominal
In the ANOVA test, when the sample means should be roughly equal in value…
… if the null hypothesis is true.
The ANOVA test uses means and standard deviations to compare the amount of variation _____ with the amount of variation _____.
within categories, between categories
In the ANOVA test, if the null hypothesis is false, the means of the different sample should be _____ and the standard deviation of the different samples should be _____.
very different in value, low in value
In the ANOVA test, if the null hypothesis is true, then…
(a) SSB should b at least twice as much as dfb
(b) SSB should be much greater than SSW
(c) the mean square between should be roughly equal to or smaller than the mean square within
(d) the combined dfb and dfw should be much greater than the SST
(c) the mean square should be roughly equal to or smaller than the mean square within.
ANOVA is a one tailed test and we are concerned only with those outcomes in which there is more variance…
…between categories than within categories.
To conduct a chi square test, the variables must first be organized into a ______.
bivariate table
The subtotals calculated for bivariate tables are also known as ____.
marginals
In the context of chi square, variables are independent if…
(a) they are related.
(b) cause and effect can be proved.
(c) the obtained chi square falls in the critical region.
(d) the score of a case on one variable has no effect on the score of the case on the other variable.
(d) the score of a case on one variable has no effect on the score of the case on the other variable.
In a 2x2 table, all cell frequencies are exactly the same. This is consistent with which of the following conditions?
The variables are independent.
When the null hypothesis in the chi square test for independence is true, there should be…
….little difference between the observed frequencies and the expected frequencies.
A Chi square test has been conducted to assess the relationship between marital status and church attendance. The obtained Chi square is 23.45 and the critical Chi square is 9.488. What may be concluded?
Reject the null hypothesis, church attendance and marital status are dependent
In a research study conducted to determine if arrests were related to the socioeconomic class of the offender, the chi square critical score was 9.488 and the chi square test statistic was 12.2. We can conclude that the variables are ____.
dependent.
If variables are arranged in a bivariate table, we can see if they are associated by…
(a) adding their scores vertically.
(b) subtracting their scores horizontally.
(c) computing percentages in the direction of the independent variable.
(d) computing percentages in the direction of the dependent variable.
(c) computing percentages in the direction of the independent variable.
In the case of a perfect association, predictions from one variable to another can be made (with/without) error.
without
“As education increases, income rises.” This is an example of a ______ relationship.
positive
If a researcher is looking to perform an analysis based upon the relationship between the number of arrests as an adult and number of encounters with police as a juvenile, they would use which measure of association?
(a) Somers d.
(b) Lambda.
(c) Chi-Square.
(d) none of these would be appropriate.
(d) none of these would be appropriate.
Proportional reduction in error (PRE) measures of association are based on the logic of ______.
prediction
If there is no association between two variables, knowledge of the independent variable does what to the number of errors of prediction?
Does not change the number of errors of prediction.
A bivariate table shows the association between gender and whether or not a person ever attends formal religious services. Lambda was .34. What may be concluded?
(a) Women are more likely to attend church.
(b) Men are more likely to attend church.
(c) Knowing a person’s gender improves our ability to predict whether or not they attend religious services by 34%.
(d) Knowing whether a person attends religious services improves our ability to predict their gender by 34%.
(c) Knowing a person’s gender improves our ability to predict whether or not they attend religious services by 34%.
A researcher has computed a Somers’s d of −0.75 between marital happiness and number of children. What can be concluded from this result?
that there is a strong, negative relationship between number of children and marital happiness
On a scatterplot, the regression line…
…comes as close as possible to touching every score.
There is no linear relationship between two interval-ratio variables when the regression line on a scatterplot…
is parallel to the horizontal axis.
The Y intercept is the point where…
…the regression line crosses the vertical axis of the scattergram.
If a regression line is parallel to the horizontal axis of the scattergram, the slope (b) will be ___.
0.00
If the regression line showing the effect of education on income has a slope of 1000…
…every year of education increases income by 1000
A researcher wants to measure the strength of the association between income (measured in dollars per year) and education (measured in number of years of formal schooling). Which of the following would be the most appropriate measure?
(a) the slope (b)
(b) y-intercept
(c) chi-square
(d) pearson’s r
(d) pearson’s r
In a study of the relationship between geographical mobility (number of times a person has changed residences) and number of friends, Pearson’s r is reported as .40. Which of the following would be the most correct interpretation?
Mobility explains 16% of the variation in number of friends.
mean square between groups
since we will simultaneously consider many groups, and evaluate whether their sample means differ more than we would expect from natural variation
Null hypthesis for mean square between groups
If the null hypothesis is true, any variation in the sample means is due to chance and shouldn’t be too large.
what is the statistic for ANOVA
f-statistic