Association between two Categorical Variables Flashcards
What is the use of a two-way table (cross-tabulation or contingency table)?
Useful for when we want to examine the relationship between two categorical variables
How do we perform a Chi-squared test?
- Construct a contingency table
- Obtain expected results (if null hyopthesis were true)
- Find X^2 test statistic
- Use x^2 distribution tables to find the p-value
What is the difference between capital X and x in Chi-squared testing?
The test statistic is denoted by the capital X^2 and the theoretical distribution is x^2.
Why is the Chi squared distribution applicable to contingency tables?
The chi-square distribution is the distribution of the sum of the squares of one or more independent standard normal variables. Since we are dealing with counts, which are discrete, rather than continuous, data you may wonder why the normal distribution is relevant. In fact it is only an approximation, which is accurate only for reasonably large sample sizes
How do we interpret the Chi squared test statistic?
Large values of X^2 - the data do not support the null hypothesis
Small values of X^2 - the data do support the null hypothesis
What variable do we need in addition to the Chi-squared statistic to find a p-value?
Degrees of freedom (free cells)
v (nu) = (r-1)(c-1)
(r and c do not include the total column or row)
What additional test is preferable when dealing with ordered categorical data?
A test for trend
What are the limitations in test for trend?
It assumes a steady increase or decrease in risk in accordance with levels of the variation in question. It cannot detect U-shaped risk relationship such as pre-term birth and maternal age where the risks are high in very young as well as very old mothers
What is the alternative way of calculating the X2 test statistic?
Using marginal totals
How can the marginal totals method of calculating the X2 test statistic be modified for small sample sizes?
Continuity correction
What are the rules about sample size and the validity of the chi-squared test?
For 2x2 tables:
If N>40, it is valid
If N 20-40 and the smallest expected value is at least 5 it is valid
Otherwise Fisher’s exact test is used
For 2XR and RxC tables:
Valid if no more than 20% of expected values are less than 5 and none are less than 1.
Summary of procedure for determining association between two categorical variables
Step 1: Display data as two-way table. The first thing is to look at the raw data.
Step 2: Calculate row or column percentages as appropriate. We want to see the percentages to help us interpret the data.
Step 3: Declare null hypothesis and calculate chi-squared value. Now we’re ready to calculate the chi-squared.
Step 4: Calculate degrees of freedom. Now we need both the chi-squared value and the degrees of freedom before we can look up the P value.
Step 5: Refer to chi-squared distribution to get P-value Now we’re ready to look up the P value.
Step 6: Step 6: Interpret P-value. Now we have all the results of the analysis, we can interpret them.
What additional statistical method is required for a full interpretation of the X2 test?
Calculation of confidence intervals