Test of association: chi-squared Flashcards
1
Q
Overview of Chi-Squared Test
A
- Definition: A statistical test used to determine whether there is a significant association between categorical variables.
- Purpose: To assess how likely it is that an observed distribution of data fits with a specific distribution.
2
Q
Types of Chi-Squared Tests
A
- Chi-Squared Test for Independence:
o Tests whether two categorical variables are independent of each other.
o Example: Examining if gender is associated with preference for a specific product. - Chi-Squared Test for Goodness of Fit:
o Tests whether the observed frequency distribution of a single categorical variable matches an expected distribution.
o Example: Determining if a die is fair based on observed rolls.
3
Q
Formula for Chi-Squared Test
A
χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}χ2=∑E(O−E)2
where:
* OOO = Observed frequency
* EEE = Expected frequency
4
Q
Steps to Conduct a Chi-Squared Test
A
- State the Hypotheses:
o Null Hypothesis (H0H_0H0): Assumes no association between variables (for independence).
o Alternative Hypothesis (HaH_aHa): Assumes there is an association. - Create a Contingency Table: For independence, list observed frequencies for each category.
- Calculate Expected Frequencies:
E=(Row Total×Column Total)Grand TotalE = \frac{(Row\ Total \times Column\ Total)}{Grand\ Total}E=Grand Total(Row Total×Column Total) - Calculate Chi-Squared Statistic: Use the formula provided.
- Determine Degrees of Freedom (df):
df=(r−1)(c−1)df = (r - 1)(c - 1)df=(r−1)(c−1)
where rrr is the number of rows and ccc is the number of columns. - Compare to Critical Value: Use a chi-squared distribution table to find the critical value at a specified significance level (usually 0.05).
- Make a Decision: If χ2\chi^2χ2 calculated > critical value, reject H0H_0H0.
5
Q
Interpretation of Results
A
- Outcome:
o If H0H_0H0 is rejected: Conclude there is a significant association between the variables.
o If H0H_0H0 is not rejected: Conclude there is no significant association. - Example Reporting: “The chi-squared test revealed a significant association between smoking status and lung disease, χ2(1,N=100)=15.32,p<0.001\chi^2(1, N = 100) = 15.32, p < 0.001χ2(1,N=100)=15.32,p<0.001.”
6
Q
Advantages and Limitations
A
- Advantages:
o Simple to compute and interpret.
o Useful for analyzing categorical data. - Limitations:
o Cannot provide information about the strength or direction of the association.
o Sensitive to sample size; large samples can lead to statistically significant results even for trivial associations.
7
Q
Assumptions of Chi-Squared Test
A
- Observations should be independent.
- The sample size should be sufficiently large (expected frequencies should generally be 5 or more).
- Data should be in the form of counts (frequencies) rather than percentages or proportions.
8
Q
Example of Chi-Squared Test for Independence
A
- Scenario: Investigating the relationship between smoking status (smoker, non-smoker) and lung disease (yes, no).
- Data Collection: Collect data from a sample of individuals and create a contingency table.
Lung Disease Lung Disease-No Total
Smoker 30 10 40
Non-Smoker 5 55 60
Total 35 65 100 - Expected Frequencies Calculation:
o For smokers with lung disease: E=(40×35)100=14E = \frac{(40 \times 35)}{100} = 14E=100(40×35)=14
o For non-smokers with lung disease: E=(60×35)100=21E = \frac{(60 \times 35)}{100} = 21E=100(60×35)=21 - Chi-Squared Calculation: Calculate χ2\chi^2χ2 and compare it with the critical value.
9
Q
A