Chapter 26: Comparing Counts Flashcards
Define ‘Chi-square statistic’.
Can be used to test whether the observed counts in a frequency distribution or contingency table match the counts we would expect according to some model. It is calculated as
χ^2 = ∑(Obs-Exp)^2 / Exp
Chi-square statistics differ in how expected counts are found, depending on the question asked.
Define ‘Chi-squared model’.
Chi-squared models are skewed to the right. They are parameterized by their degrees of freedom and become less skewed with increasing degrees of freedom.
Define ‘Chi-squared test of goodness fit’.
A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a test of goodness-of-fit. In a chi-squared goodness-of-fit test, the expected counts come from the predicting model. The test finds a P-value from a chi-squared model with the number of categories in the categorical variable -1 degrees of freedom.
Define ‘Chi-squared component’.
The components of a chi-squared calculation are
(Obs - Exp)^2 / Exp
found for each cell of the table.
Define ‘Cell’.
One element of a 2-way table corresponding to a specific row and a specific column. Table cells can hold counts, percentages, or measurements on other variables, or they can hold several values.
Define ‘Chi-squared test for homogeneity’.
A test comparing the distribution of counts for 2 or more groups on the same categorical variable is called a test of homogeneity. A chi-square test of homogeneity finds expected counts based on the overall frequencies, adjusted for the totals in each group under the (null hypothesis) assumption that the distributions are the same for each group. We find a P-value from a chi-square distribution with (# rows - 1) x (# cols - 1) degrees of freedom, where # rows gives the number of categories and # cols gives the number of independent groups (or vice versa).
Define ‘Standardized residual’.
In each cell of a 2-way table, a standardized residual is the square root of the chi-square component for that cell with the sign of the Observed-Expected difference:
(Obs - Exp) / sqrt(Exp)
When we reject a chi-squared test, an examination of the standardized residuals can sometimes reveal more about how the data deviate from the null model.
Define ‘Chi-squared test for independence’.
A test of whether two categorical variables are independent examines the distribution of counts for one group of individuals classified according to both variables. A chi-squared test of independence finds expected counts by assuming that knowing the marginal totals tells us the cell frequencies, assuming that there is no association between the variables. This turns out to be the same calculation as a test of homogeneity. We find a P-value from a chi-squared distribution with (# rows - 1) x (# cols -1) degrees of freedom, where # rows gives the number of categories in one variable and # cols gives the number of categories in the other.
Define ‘Contingency table’.
A 2-way table that classifies individuals according to 2 categorical variables.
Define ‘2-way table’.
Each cell of a 2-way table shows counts of individuals. One way classifies a sample according to a categorical variable. The other way can classify different groups of individuals according to the same variable or classify the same individuals according to a different categorical variable.
Briefly describe all 3 related methods that look at counts of data in categories and rely on chi-squared models.
Goodness-of-fit test: compare the observed distribution of a sing categorical variable to an expected distribution based on a theory or model.
Test of homogeneity: compare the distribution of several groups for the same categorical variable.
Test of independence: examine counts from a single group of evidence of an association between two categorical variables.
How do you calculate the expected cell frequencies?
(column total) x (row total) / (table total)
What are the assumptions and corresponding conditions for chi-squared test?
- Counted data condition
- Independence assumption; randomization makes independence more plausible
- Sample size assumption with the expected cell frequency condition; expect at least 5 observations in each cell
Although the tests use a 1-sided upper tail critical region when looking for evidence against the null hypothesis, the alternative hypothesis is actually…?
Many sided, because there are many ways that a table of counts can deviate significantly from what we hypothesised.
When the null-hypothesis is rejected, examine the …?
Standardized residuals in order to better understand patterns in the table.