Week 5: Chi-Squared Test Flashcards

1
Q

What does the chi-squared test assess?

A

The association between two variables measured on a nominal scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the assumptions of the chi-squared test?

A
  • Measures association, not causation
  • Cannot determine the strength or direction of the relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is independence in the context of chi-squared tests?

A

Two variables are independent if the frequency breakdowns of one variable are similar across all groups of the other variable
E.g., if there is no relationship between outlook in life and education and 44% of our total sample declared that life is exciting, we would expect 44% of those with low, middle, and high education to state their life is exciting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the steps for hypothesis testing?

A
  1. State H0
  2. State H1
  3. Choose a significance level (α)
  4. Select and compute the test statistic
  5. Find the critical value and compare
  6. Interpret the results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the H0 for a chi-squared test of independence?

A

There is no association between the two variables (they are independent). We expect the relative frequency of the independent variable to be the same across groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is the chi-squared statistic calculated?

A

χ2 =∑ (O−E)2 / E
O = Observed values
E = Expected values
We want to calculate how much the observed frequencies “depart” from H0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are expected values calculated?

A

(Row total) x (column total) / Grand total
Must use number of individuals (counts) and NOT proportions, ratios, or frequencies. Check row and column numbers still add up to totals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a large chi-squared statistic imply?

A

A large statistic suggests a lower probability that the variables are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you determine if the chi-squared result is significant?

A

Compare the observed chi-squared value with the critical value or check if p-value < α

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What would we see if the two variables are independent compared to not independent?

A
  • If the two variables are independent, the magnitude of the difference between the observed frequency and the expected frequency is relatively small
  • The larger the sum of square residuals, the larger the chi-squared statistic, the lower the probability that the variables are independent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the chi-squared distribution

A

Non-directional (i.e., two-dided); skewed; no negative values; not symmetric
Different distributions depending on df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the χ2 critical value with df = 1

A

3.841
Example: An observed value of 1.47 is smaller than 3.841. This corresponds to p-value = 0.226, i.e., χ2 values equal to or greater than 1.47 are expected to occur 22.6% of the time when H0 is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we assess direction after significant χ2 test?

A

To determine which cell(s) in the contingency table drove the association, examining the percentages in the contingency table and expected frequency table may be misleading. The difference between the observed and the expected frequencies (i.e., residual) is more reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to calculate standardised residuals?

A

residuals = Obs - Exp / √ Exp
- Positive standardised residuals mean that the cell was over-represented compared to the expected frequency
- Negative standardised residuals mean that the cell was under-represented (i.e., fewer subjects in category than expected)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you decide which residual is driving the lack of independence

A

As a rule of thumb, standardised residuals >= 2 are thought to drive the lack of independence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some key checks and considerations for χ2?

A
  • χ2 is not informative about strength of associations
  • χ2 is very sensitive to sample size (the size of observed χ2 is proportional to n, independent of the strength of the relationship between variables)
  • χ2 is sensitive to small expected frequencies (unreliable if cell has expected frequencies < 5)
17
Q

What can you do if 1 cell has expected frequencies < 5?

A

Collapse categories and re-run χ2, or use Fisher’s exact test