Week 4- Associations between categorical variables Flashcards
What is a categorical variable
Categorical variables represent types of data which may be divided into groups. Examples of categorical variables are race, sex, age group, and educational level
What is chi-square- definition
Non-parametric tests or inference for categorical data
What is chi-square
- Measures the relationship between 2 or more nominal variables, questioning whether observations are contingent upon another categorical variable
- Tests whether frequency counts can be expected by chance or whether there’s a relationship between the categorical variables
Examples of chi-square. Testing whether2 nominal variables are associated
- Is gender associated with preferred subject?
- Is ownership of a dog associated with residence?
- Is smoking associated with drinking
Hypothesis testing
- Null- there is no association between the 2 variables
- Alternative- the two variables are associated
Steps for constructing chi-squared by hand
- Note the observed frequencies in each cell in a contingency table
- Calculate the expected frequency for each cell by multiplying the two relevant
marginal totals (the row total and the column total) for that cell and divide by the
total number of participants in the sample. - Calculate chi-square (see formula above) by:
a. Calculating the difference between the observed and the expected frequency
for each cell and squaring that number;
b. Dividing the result by the expected frequency for that cell. - Add up the results for all cells to acquire chi-square.
- Determine the degrees of freedom by multiplying the number of columns minus 1
by the number of rows minus 1 (see formula above). - Look up the significance of chi-square in the table below
Assumptions of chi-squared
-Independence
> Data cannot be related (must use distinct nominal categories)- cannot fall into both categories. Between subjects
- Raw frequencies
> Should be conducted on raw frequencies not percentages
- Sample/ cell size
> No expected cell frequencies should be less than 1 and no more than 20% should be less than 5
- If don’t meet this can collapse categories
Other important things to note when reporting chi-squared
- Percentages
> Chi-square should be conducted on raw frequencies not percentages. But percentages are useful to report in addition to these raw frequencies - Cramer’s V
> Measure of effect size - Variance accounted for
> We can square the effect size to see how much variance one variable can be accounted for by other variables
What are standardised residuals
- Help determine which cells are contributing to the ‘significant association’
- They’re z scores indicating how many SD’s above or below an expected count an observed count is
What are degrees of freedom?
-the number of independent pieces of info that went into calculating the estimate
e.g. 9, 10 and mean of 30, other number must be 11
Issues with chi-square
- raw frequency counts NOT percentages
AND - Larger contingency tables can be difficult to interpret
- So can use
> Standardised residuals to determine main contributors
> Partitioning- carry out multiple 2 times 2 chi-squares
> Combine categories