Chi-Squared test Flashcards

1
Q

What is meant by predicting categorical outcome variables?

A

in which category an entity falls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is used to measure categorical values numerically?

A

Frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Chi-squared test used for?

A

defining whether there is a relationship between two categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the Chi-Squared test compare to assess this?

A

It is comparing the observed frequencies with the expected frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What formula does the chi-squared test use?

A

𝑋2 = βˆ‘ (π‘œπ‘π‘ π‘’π‘Ÿπ‘£π‘’π‘‘ π‘ π‘π‘œπ‘Ÿπ‘’ βˆ’ 𝑒π‘₯𝑝𝑒𝑐𝑑𝑒𝑑 π‘ π‘π‘œπ‘Ÿπ‘’)^2 /𝑒π‘₯𝑝𝑒𝑐𝑑𝑒𝑑 π‘ π‘π‘œπ‘Ÿπ‘’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you calculate the expected score?

A

row total * column total /n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you calculate the degrees of freedom for the chi square?

A

(r-1) (c-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In order to use the chi squared distribution with the chi-squared test, what is required?

A

In order to use the chi-squared distribution with the chi-squared statistic, there is a need for the expected value in each cell to be greater than 5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If the expected value is not greater than 5, what can be done?

A

Fisher’s exact test can be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an alternative to the chi-squared statistic?

A

Likelihood ratio statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the Likelihood ratio statistic utilise?

A

Comparing the probability of obtaining the same data under the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What distribution does the Likelihood ratio use?

A

Chisq distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What error does the Chi-sq distribution tend to make (when) and how can this be corrected?

A

The chi-square statistic tends to make a type-I error if the table is 2 x 2. This can be corrected for by using Yates’ correction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What assumptions does the chi-sq test carry? (3)

A

One assumption the chi-square test uses is the assumption of independence of cases. Each person, item or entity must contribute to only one cell of the contingency table. Another assumption is that in 2x2 tables, no expected value should be below 5. In larger tables, not more than 20% of the expected values should be below 5 and all expected values should be greater than 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the result of not meeting this expected values assumption?

A

leads to a reduction in test power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is meant by the residual?

A

The residual is the error between the expected frequency and the observed frequency.

17
Q

How is the standardised residual calculated?

A

observed-expected / sqrt(expected)

18
Q

How do individual standardised residuals have a direst relationship with the test statistic?

A

the chi-square statistic is composed of the sum of the standardized residuals.

19
Q

What is used to give an effect size in chi sq

A

Cramer’s V can give an effect size

20
Q

In odds tables, what is usually used as the effect size?

A

odds-ratio (times A occurred/ times A did not occur)

21
Q

What is the actual odds ratio?

A

odds of event A divided by the odds of event B

22
Q

When is the phi test accurate?

A

2 x 2 contingency tables ( for measuring associations from 0-1)

23
Q

What should be used for phi test outside 2x2?

A

Contingency coefficient

24
Q

What shortcomings does the contingency coefficient and what attempts to mend this?

A

between 0-1 but seldom reaches upper limit and so Cramer V corrects for this

25
Q

If expected values are below 5 what are the recommended options if you have more than 2 variables? (4)

A

(1) collapse the data across the variables (preferably least likely to have an effect
(2) collapse levels of one of the variables
(3) collect more data
(4) accept the loss of power