Lecture 14 - Analysing Categorical Data Flashcards
What is analysis of categorical data?
- We sometimes want to predict which category someone falls into
- E.g. traitor or faithful
- We can create a contingency table and perform a chi-square test on the data
- Do people fall into a category more often than we expect them to?
What are contingency tables?
- A table of frequencies for how often an observation occurs in a category (how many people chose which option)
- Categories must be mutually exclusive and exhaustive (no overlap)
What is a Chi-square test?
- Devised by Karl Pearson in 1900, also known as Pearson’s chi-square
- Calculates how often a particular observation falls into a category based on how many were expected by chance
What is the null hypothesis in a chi-square test?
The frequencies observed were expected by chance
What is the alternative hypothesis in a chi-square test?
The frequencies observed reflect real differences in categories
What are the assumptions of a chi-square test?
- Independence - each person can only contribute to one cell of a contingency table
- Expected frequencies - all expected counts should be greater than 1 and no more than 20% of expected counts should be less than 5
- If violated, power is reduced
- Terms ‘values’, ‘frequencies’ and ‘count’ interchangeable
What happens when expected frequencies are violated?
- Results in a loss of power
- Several options:
- Use an ‘Exact’ test instead (e.g. Fisher’s or MLR)
- Collapse/remove data across one variable
- Collapse levels of one variable
- Collect more data
- Accept the loss of power
How do you calculate a chi-square test by hand for one IV?
- Three steps:
- (1) Calculate expected frequencies
- (2) Calculate Chi-Square value based on observed and expected frequencies
- (3) Compare Chi-Square value against a critical values table
How do you interpret chi-square critical values tables?
- To interpret the table we need to know our degrees of freedom, and our desired alpha value
- Degrees of freedom = number of categories-1
- Reject H0 when Χ2observed > Χ2critical
How do you calculate a chi-square test by hand for two IVs?
- With two IVs, the difference will be in calculating the expected values in each case
- To calculate expected frequencies for two IVs, we need to calculate expected frequencies of specific cells
- Degrees of freedom = (number of rows-1) x (number of columns-1)
How do you conduct a chi-square test for one IV in SPSS?
- For one IV
- Analyse -> non-parametric tests -> legacy dialog -> chi-square test
How do you conduct a chi-square test for two IVs in SPSS?
- For two IVs
- Analyse -> descriptive statistics -> crosstabs (need to click ‘statistics’ to ask for Chi-square test)
How do you report the chi-square test for one IV?
E.g. “The number of people choosing to be Traitors or Faithfuls can be seen in Table/Figure ‘X’. This distribution is significantly different to chance (χ2(1)=5.4, p=.02).”
How do you report the chi-square test for two IVs?
E.g. “There was a significant association between a viewer’s favorite Netflix show and where they were from. (χ2(1)=5.44, p=.02, Cramer’s v= .301). Whilst people from the UK preferred to be a Faithful, people from the USA preferred to be a Traitor’”
What is a binomial test?
- Compares observed and expected frequencies for variables with only two levels
- E.g. Are there more participants in our sample from the USA than what we would expect by chance?
How do you conduct a binomial test in SPSS?
- AnalyseNon-parametric testsLegacy dialogsBinomial
When should binomial tests be performed compared to chi-square?
- Binomial tests should be performed on variables with two levels, Chi-Square tests should be performed with more than two levels of a given variable, or more than two variables