Topic 12 Flashcards by James Makmur

Why might we use a Chi square test instead of a T test

This is because the other tests such as sample T tests only measure quantitative data, whereas Chi square tests are able to measure qualitative data. Additionally, it allows for testing of more than 3 categories

How well did you know this?

Not at all

Perfectly

What are the different functions of the Chi square tests?

(3 different ways)

Goodness of fit

Homogeneity

Independence

How well did you know this?

Not at all

Perfectly

How can χ2 tests be used for goodness of fit

Goodness of a fit tests whether the observed values match up with the expected values.

They test a hypothesis about the distribution (model) of a qualitative variable in a population

I.e. do eye colours follow the following distribution? Brown 45%, blue 27%, hazel 18%, green 10%, etc and tthen compare this to the observed value (these are the expected values)

How well did you know this?

Not at all

Perfectly

How can χ2 tests be used to test for homogeneity

Tests a hypothesis about the distribution of a qualitative variable in several populations

How well did you know this?

Not at all

Perfectly

How can χ2 tests be used to test for independence

Tests a hypothesis about the relationship between two qualitative variables in a population - i.e. whether they are independent or not (is there an association between the two)

I.e. is there an association between parent eye colour (qualitative) and child eye colour (qualitative)

How well did you know this?

Not at all

Perfectly

How is the test statistics for all of the tests above calculated

χ2 (test statistic) = sum of ( [observed frequency - expected frequency] ^2 / expected frequency )

How well did you know this?

Not at all

Perfectly

What are the hypotheses for the chi square test for goodness of fit

H0 - model fits data/expected frequencies.

H1 - Model doesn’t fit data

How well did you know this?

Not at all

Perfectly

What are the assumptions involved in chi square tests for goodness of fit

None of the expected categories have a value of 0, and no more than 20% of the expected values are less than 5 (Cochran’s rule)

How well did you know this?

Not at all

Perfectly

What is Cochran’s rule

No more than 20% of the expected values are less than 5 - in other words, we want at least 80% of the results with an expected value of greater than 5

How well did you know this?

Not at all

Perfectly

How do we calculate the number of degrees of freedom from a chi square test

n-1

n = number of categories)

How well did you know this?

Not at all

Perfectly

How do we find the p-value from a chi square test

We use χ2 (n-1) curve to find upper tail area, n = number of categories

How well did you know this?

Not at all

Perfectly

How would we use chi square test to test for independence between two variables

We typically represent the data between two qual variables and two qual variables in a contingency table before putting it into a mosaic plot.

How well did you know this?

Not at all

Perfectly

What is the code for the chi square test

chisq.test(dataset)

How well did you know this?

Not at all

Perfectly

What are the hypotheses involved with the chi square test for independence

H: H0 - variables are independent

H1 - variables aren’t independent

How well did you know this?

Not at all

Perfectly

What are the assumptions involved with the chi square test for independence

Expected categories - none are empty, and no more than 20% are <5 (Cochran’s rule) - follows same assumptions as the chi square test for goodness of fit

How well did you know this?

Not at all

Perfectly

How do we calculate the test statistic from chi square test for independence

Study These Flashcards

χ2 = sum of ([Observed frequency - expected frequency]) ^2 / expected frequency

How do we calculate p - values for chi square test for independence

Study These Flashcards

We use the χ2 curve to find upper tail area, however, this has a limit of degrees of freedom (df) of (m-1) (n-1), where m = number of categories from variable 1, and n = number of categories from variable 2

How do we calculate degrees of freedom for chi square test for independence

Study These Flashcards

We can use (m-1)(n-1) = df, where m = number of categories from variable 1 (i.e. a row), and n = number of categories from variable 2 (i.e. a column)

What happens when df = 1, or there is a 2 x 2 contingency table

Study These Flashcards

When this occurs, R automatically applies the Yates continuity correction onto the p - value obtained.

What does Yates continuity correction do

Study These Flashcards

It aims to make the test more conservative in terms of p values especially for small sample sizes or small number of categories as it could result in an increase in bias. This ultimately helps reduce the number of type 1 errors (false positives)

How can we turn off Yates continuity correction

Study These Flashcards

We can say correction = FALSE

So what are the differences in the HATPC process for the use of chi square test for independence vs goodness of fit

Study These Flashcards

Only differences are in the hypotheses, and the way the degrees of freedom are calculated

Is it possible to use T tests to test the significance of a linear slope

Study These Flashcards

Yes, it is possible, and it is often used

What are the hypotheses typically involved in testing for the significance of the slope

Study These Flashcards

H0 = no significant linear trend

H1 = significant linear trend

(normally)

What are the assumptions involved in testing for the significance of a line)

Residuals need to be independent Residuals should follow a normal distribution Residuals should have a constant variance Relationship between dependent and independent variable should look linear

How can we check for homoscedasticity

Check residual plot for no observable pattern

How can we check for residuals following normal distribution

Use a QQ plot and also a SHapiro Wilk test

How can we check that residuals are independent

Check residual plot

How can we check that there is a linear relationship between dependent and independent variables

Check the scatter plot for a linear relationship

What is the test statistic for testing for significant linear relationship

T = (OV - EV) / SE, with n-2 degrees of freedom

How is p - value obtained for linear relationship

We do the test statistic with n-2 degrees of freedom, and find the tail areas We can also find it by doing the summary() output to give us the values that we want

Topic 12 Flashcards

(31 cards)