Categorical data analysis Flashcards
When is categorical data analysis used?
When your outcome variable is nominal scale
The predictor variables can be anything, however, these lectures will only have a single predictor variable and the predictor will be a nominal scale
What is the chi-square “goodness of fit” test?
A test to determine how good our observed data matches the values expected by theory
What value does a chi-square ‘goodness of fit’ test use?
An X2 value
When should you use a chi-square ‘goodness of fit’ test?
Chi-square goodness of fit test is used for categorical data when you want to compare observed frequencies against some hypothesis about the true probabilities
Explain the four principles for a statistical test for the chi-square ‘goodness of fit’ test.
- A diagnostic test statistic T*
- Sampling distribution of T if the null is true*
- The observed T in your data*
- A rule that maps every value of T onto a decision (accept or reject H0)*
How do you get the chi-squared distribution (X2)?
X2 is what you get when you take normally distributed data, square it and add it.
What are the features of the chi-square ( χ2 ) distribution?
Continuous distribution
Has a noticeable positive skew to it
The shape of the distribution depends on the ‘degrees of freedom’
What is ‘degrees of freedom’ (DOF)?
The total number of ‘things’ you’re interested in minus the number of known constraints on those ‘things’
- The number of degrees of freedom is the number of quantities of interest in the data - 1 (one constraint on those quantities)*
- E.g. for a chi-square goodeness of fit test involving k categories, the degrees of freedom is equal to k-1*
What is another name for the rejection region?
Critical region
When can we reject the H0 in a chi-square test?
There is a 5% chance of observing an X2 value greater than the significance level
Therefore we can ensure a Type 1 error rate of .05 if we reject H0 only if X2 is greater than 95% significance level
What are the 3 important outputs in a chi-square ‘goodness of fit’ test?
The test statistic (X2)
The p-value
The degrees of freedom for the test
How should you write up a chi-square ‘goodness of fit test’?
1) Report the relevant descriptive statistics (can also do this in a table or figure in your text)
2) Specify the null hypothesis and the statistical test run
3) Give the result of the test
Chi-square tests are used for 1)______ data: the outcome variable is 2)_____ scale
1) Categorical
2) Nominal
For a chi-square test, describe:
Diagnostic test statistic
Distribution
Degrees of freedom
What is the R code for a ‘goodness of fit’ test?
goodnessOfFitTest(x, p)
x = raw nominal data, p = null hypothesis
e.g. polling data, election results