Topic 12 Flashcards

1
Q

Why might we use a Chi square test instead of a T test

A

This is because the other tests such as sample T tests only measure quantitative data, whereas Chi square tests are able to measure qualitative data. Additionally, it allows for testing of more than 3 categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the different functions of the Chi square tests?

(3 different ways)

A

Goodness of fit

Homogeneity

Independence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can χ2 tests be used for goodness of fit

A

Goodness of a fit tests whether the observed values match up with the expected values.

They test a hypothesis about the distribution (model) of a qualitative variable in a population

I.e. do eye colours follow the following distribution? Brown 45%, blue 27%, hazel 18%, green 10%, etc and tthen compare this to the observed value (these are the expected values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can χ2 tests be used to test for homogeneity

A

Tests a hypothesis about the distribution of a qualitative variable in several populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can χ2 tests be used to test for independence

A

Tests a hypothesis about the relationship between two qualitative variables in a population - i.e. whether they are independent or not (is there an association between the two)

I.e. is there an association between parent eye colour (qualitative) and child eye colour (qualitative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is the test statistics for all of the tests above calculated

A

χ2 (test statistic) = sum of ( [observed frequency - expected frequency] ^2 / expected frequency )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the hypotheses for the chi square test for goodness of fit

A

H0 - model fits data/expected frequencies.

H1 - Model doesn’t fit data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the assumptions involved in chi square tests for goodness of fit

A

None of the expected categories have a value of 0, and no more than 20% of the expected values are less than 5 (Cochran’s rule)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Cochran’s rule

A

No more than 20% of the expected values are less than 5 - in other words, we want at least 80% of the results with an expected value of greater than 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we calculate the number of degrees of freedom from a chi square test

A

n-1

n = number of categories)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we find the p-value from a chi square test

A

We use χ2 (n-1) curve to find upper tail area, n = number of categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How would we use chi square test to test for independence between two variables

A

We typically represent the data between two qual variables and two qual variables in a contingency table before putting it into a mosaic plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the code for the chi square test

A

chisq.test(dataset)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the hypotheses involved with the chi square test for independence

A

H: H0 - variables are independent

H1 - variables aren’t independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the assumptions involved with the chi square test for independence

A

Expected categories - none are empty, and no more than 20% are <5 (Cochran’s rule) - follows same assumptions as the chi square test for goodness of fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we calculate the test statistic from chi square test for independence

A

χ2 = sum of ([Observed frequency - expected frequency]) ^2 / expected frequency

17
Q

How do we calculate p - values for chi square test for independence

A

We use the χ2 curve to find upper tail area, however, this has a limit of degrees of freedom (df) of (m-1) (n-1), where m = number of categories from variable 1, and n = number of categories from variable 2

18
Q

How do we calculate degrees of freedom for chi square test for independence

A

We can use (m-1)(n-1) = df, where m = number of categories from variable 1 (i.e. a row), and n = number of categories from variable 2 (i.e. a column)

19
Q

What happens when df = 1, or there is a 2 x 2 contingency table

A

When this occurs, R automatically applies the Yates continuity correction onto the p - value obtained.

20
Q

What does Yates continuity correction do

A

It aims to make the test more conservative in terms of p values especially for small sample sizes or small number of categories as it could result in an increase in bias. This ultimately helps reduce the number of type 1 errors (false positives)

21
Q

How can we turn off Yates continuity correction

A

We can say correction = FALSE

22
Q

So what are the differences in the HATPC process for the use of chi square test for independence vs goodness of fit

A

Only differences are in the hypotheses, and the way the degrees of freedom are calculated

23
Q

Is it possible to use T tests to test the significance of a linear slope

A

Yes, it is possible, and it is often used

24
Q

What are the hypotheses typically involved in testing for the significance of the slope

A

H0 = no significant linear trend

H1 = significant linear trend

(normally)

25
Q

What are the assumptions involved in testing for the significance of a line)

A

Residuals need to be independent

Residuals should follow a normal distribution

Residuals should have a constant variance

Relationship between dependent and independent variable should look linear

26
Q

How can we check for homoscedasticity

A

Check residual plot for no observable pattern

27
Q

How can we check for residuals following normal distribution

A

Use a QQ plot and also a SHapiro Wilk test

28
Q

How can we check that residuals are independent

A

Check residual plot

29
Q

How can we check that there is a linear relationship between dependent and independent variables

A

Check the scatter plot for a linear relationship

30
Q

What is the test statistic for testing for significant linear relationship

A

T = (OV - EV) / SE, with n-2 degrees of freedom

31
Q

How is p - value obtained for linear relationship

A

We do the test statistic with n-2 degrees of freedom, and find the tail areas

We can also find it by doing the summary() output to give us the values that we want