Categorical data analysis Flashcards
When is categorical data analysis used?
When your outcome variable is nominal scale
The predictor variables can be anything, however, these lectures will only have a single predictor variable and the predictor will be a nominal scale
What is the chi-square “goodness of fit” test?
A test to determine how good our observed data matches the values expected by theory
What value does a chi-square ‘goodness of fit’ test use?
An X2 value
When should you use a chi-square ‘goodness of fit’ test?
Chi-square goodness of fit test is used for categorical data when you want to compare observed frequencies against some hypothesis about the true probabilities
Explain the four principles for a statistical test for the chi-square ‘goodness of fit’ test.
- A diagnostic test statistic T*
- Sampling distribution of T if the null is true*
- The observed T in your data*
- A rule that maps every value of T onto a decision (accept or reject H0)*
How do you get the chi-squared distribution (X2)?
X2 is what you get when you take normally distributed data, square it and add it.
What are the features of the chi-square ( χ2 ) distribution?
Continuous distribution
Has a noticeable positive skew to it
The shape of the distribution depends on the ‘degrees of freedom’
What is ‘degrees of freedom’ (DOF)?
The total number of ‘things’ you’re interested in minus the number of known constraints on those ‘things’
- The number of degrees of freedom is the number of quantities of interest in the data - 1 (one constraint on those quantities)*
- E.g. for a chi-square goodeness of fit test involving k categories, the degrees of freedom is equal to k-1*
What is another name for the rejection region?
Critical region
When can we reject the H0 in a chi-square test?
There is a 5% chance of observing an X2 value greater than the significance level
Therefore we can ensure a Type 1 error rate of .05 if we reject H0 only if X2 is greater than 95% significance level
What are the 3 important outputs in a chi-square ‘goodness of fit’ test?
The test statistic (X2)
The p-value
The degrees of freedom for the test
How should you write up a chi-square ‘goodness of fit test’?
1) Report the relevant descriptive statistics (can also do this in a table or figure in your text)
2) Specify the null hypothesis and the statistical test run
3) Give the result of the test
Chi-square tests are used for 1)______ data: the outcome variable is 2)_____ scale
1) Categorical
2) Nominal
For a chi-square test, describe:
Diagnostic test statistic
Distribution
Degrees of freedom
What is the R code for a ‘goodness of fit’ test?
goodnessOfFitTest(x, p)
x = raw nominal data, p = null hypothesis
e.g. polling data, election results
What is a chi-square test of association?
(also known as chi-square test of independence or chi-square test of homogeneity)
Very similar to chi-square goodness of fit test but you use it if you aren’t given the expected frequencies
Instead, you estimate them based on the data
E.g. the null hypothesis population parameter for j is given by the total observations for j divided by sample size
How is sampling distribution calculated in the chi-square ‘test of association’?
Created by squaring and summing the normally distributed variables
Same as goodness of fit
χ2(3) = 11.303, p = 0.0102 is a stat block from what type of test?
Explain what the numbers mean.
Chi-square test
χ2(3) = 11.303, p = 0.0102
χ2 = sampling distribution
(3) = degrees of freedom
11. 303 = test statistic
p = 0.0102 = p-value
Chi-square ‘goodness of fit’ tests compare observed frequencies of one variable vs what?
A hypothesis about the true probabilities of that variable.
What do chi-square ‘tests of association’ / ‘test of independence’ test?
If two nominal scale variables are related to each other
What do chi-square ‘tests of association’ / ‘test of independence’ use for its test statistic?
X2
How is degrees of freedom calculated for a chi-square test of association?
(r-1)(c-1)
where r=# of categories of one variable and c=# of categories of the other
How do you run a chi-square test of association in R?
chisq.test(x)
x = observed frequency contingency table of two nominal variables
Describe the
diagnostic test statistic
distribution
degrees of freedom
for chi-square test of association