Chi-Squared test Flashcards
What is meant by predicting categorical outcome variables?
in which category an entity falls
What is used to measure categorical values numerically?
Frequencies
What is the Chi-squared test used for?
defining whether there is a relationship between two categorical variables
What does the Chi-Squared test compare to assess this?
It is comparing the observed frequencies with the expected frequencies.
What formula does the chi-squared test use?
π2 = β (πππ πππ£ππ π ππππ β ππ₯ππππ‘ππ π ππππ)^2 /ππ₯ππππ‘ππ π ππππ
How do you calculate the expected score?
row total * column total /n
How do you calculate the degrees of freedom for the chi square?
(r-1) (c-1)
In order to use the chi squared distribution with the chi-squared test, what is required?
In order to use the chi-squared distribution with the chi-squared statistic, there is a need for the expected value in each cell to be greater than 5.
If the expected value is not greater than 5, what can be done?
Fisherβs exact test can be used.
What is an alternative to the chi-squared statistic?
Likelihood ratio statistic
What does the Likelihood ratio statistic utilise?
Comparing the probability of obtaining the same data under the null hypothesis
What distribution does the Likelihood ratio use?
Chisq distribution
What error does the Chi-sq distribution tend to make (when) and how can this be corrected?
The chi-square statistic tends to make a type-I error if the table is 2 x 2. This can be corrected for by using Yatesβ correction
What assumptions does the chi-sq test carry? (3)
One assumption the chi-square test uses is the assumption of independence of cases. Each person, item or entity must contribute to only one cell of the contingency table. Another assumption is that in 2x2 tables, no expected value should be below 5. In larger tables, not more than 20% of the expected values should be below 5 and all expected values should be greater than 1.
What is the result of not meeting this expected values assumption?
leads to a reduction in test power.
What is meant by the residual?
The residual is the error between the expected frequency and the observed frequency.
How is the standardised residual calculated?
observed-expected / sqrt(expected)
How do individual standardised residuals have a direst relationship with the test statistic?
the chi-square statistic is composed of the sum of the standardized residuals.
What is used to give an effect size in chi sq
Cramerβs V can give an effect size
In odds tables, what is usually used as the effect size?
odds-ratio (times A occurred/ times A did not occur)
What is the actual odds ratio?
odds of event A divided by the odds of event B
When is the phi test accurate?
2 x 2 contingency tables ( for measuring associations from 0-1)
What should be used for phi test outside 2x2?
Contingency coefficient
What shortcomings does the contingency coefficient and what attempts to mend this?
between 0-1 but seldom reaches upper limit and so Cramer V corrects for this
If expected values are below 5 what are the recommended options if you have more than 2 variables? (4)
(1) collapse the data across the variables (preferably least likely to have an effect
(2) collapse levels of one of the variables
(3) collect more data
(4) accept the loss of power