Week 5: Chi-Squared Test Flashcards
What does the chi-squared test assess?
The association between two variables measured on a nominal scale
What are the assumptions of the chi-squared test?
- Measures association, not causation
- Cannot determine the strength or direction of the relationship
What is independence in the context of chi-squared tests?
Two variables are independent if the frequency breakdowns of one variable are similar across all groups of the other variable
E.g., if there is no relationship between outlook in life and education and 44% of our total sample declared that life is exciting, we would expect 44% of those with low, middle, and high education to state their life is exciting.
What are the steps for hypothesis testing?
- State H0
- State H1
- Choose a significance level (α)
- Select and compute the test statistic
- Find the critical value and compare
- Interpret the results
What is the H0 for a chi-squared test of independence?
There is no association between the two variables (they are independent). We expect the relative frequency of the independent variable to be the same across groups
How is the chi-squared statistic calculated?
χ2 =∑ (O−E)2 / E
O = Observed values
E = Expected values
We want to calculate how much the observed frequencies “depart” from H0
How are expected values calculated?
(Row total) x (column total) / Grand total
Must use number of individuals (counts) and NOT proportions, ratios, or frequencies. Check row and column numbers still add up to totals
What does a large chi-squared statistic imply?
A large statistic suggests a lower probability that the variables are independent
How do you determine if the chi-squared result is significant?
Compare the observed chi-squared value with the critical value or check if p-value < α
What would we see if the two variables are independent compared to not independent?
- If the two variables are independent, the magnitude of the difference between the observed frequency and the expected frequency is relatively small
- The larger the sum of square residuals, the larger the chi-squared statistic, the lower the probability that the variables are independent
Describe the chi-squared distribution
Non-directional (i.e., two-dided); skewed; no negative values; not symmetric
Different distributions depending on df
What is the χ2 critical value with df = 1
3.841
Example: An observed value of 1.47 is smaller than 3.841. This corresponds to p-value = 0.226, i.e., χ2 values equal to or greater than 1.47 are expected to occur 22.6% of the time when H0 is true
How can we assess direction after significant χ2 test?
To determine which cell(s) in the contingency table drove the association, examining the percentages in the contingency table and expected frequency table may be misleading. The difference between the observed and the expected frequencies (i.e., residual) is more reliable
How to calculate standardised residuals?
residuals = Obs - Exp / √ Exp
- Positive standardised residuals mean that the cell was over-represented compared to the expected frequency
- Negative standardised residuals mean that the cell was under-represented (i.e., fewer subjects in category than expected)
How do you decide which residual is driving the lack of independence
As a rule of thumb, standardised residuals >= 2 are thought to drive the lack of independence