L7 chi-squared test Flashcards
Experiment example
A researcher was interested in whether animals could be trained to dance. He took 200 cats and tried to train them to dance by giving them either food or affection as a reward for dance-like behaviour. At the end of the week he counted how many animals could dance and how many could not. There are two categorical variables here: training (the animal was trained using either food or affection, not both) and dance (the animal either learned to dance or it did not). By combining categories, we end up with four different categories. All we then need to do is to count how many cats fall into each category.
Look at picture 1 to see the contingency table of this data
What is χ2 test?
A chi-squared test, is any statistical hypothesis test wherein the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true
- Often used as short for Pearson’s chi-squared test
What does χ2 test measure?
The association between two categorical variables
- Both our independent variable (IV) and our dependent variable (DV) are categorical
What is the central idea of Person’s χ2 test?
Based on the idea of comparing frequencies we observe in certain categories to frequencies you might expect to get in those categories by chance
What is the formula we use to calculate χ2? Explain each part of the formula
Picture 2
- We divide by the model scores - same process as dividing by degrees of freedom to get the mean squares (standardizes the deviation of each observation)
- i - rows in the contingency table; j - columns
- Observed data - the frequencies
What is the model in the formula of χ2?
We calculate the expected frequencies for each cell in the table using the column and row totals for that cell
- By doing so we factor in the total number of observations (n) that could have contributed to that cell
- picture 3
- For our experiment example we obtained expected frequencies for the four cells (picture 4)
- We apply all the data into our formula and get χ2 (picture 6)
What do we use to display data and calculate χ2?
Contingency tables
Picture 5
What is the χ2 distribution?
It describes the test statistic χ2 under the assumption of the null hypothesis and is used to obtain the p-value corresponding to the value of the χ2-statistic
How is its shape determined and how do we obtain a p-value from it?
- Its shape is determined by the degrees of freedom which are (r-1)(c-1), in which r is the number of rows and c is the number of columns
- Always one-sided, so when getting the p-value, we look at the probability of all the values to the right from the statistic we obtained
- picture 7
What happes to χ2 statistic’s approximation as the sample increases? How is that different with small samples?
The chi-square statistic has a sampling distribution that approximates a chi-square distribution, and this approximation improves as the sample size increases
- For large samples, this approximation is accurate enough, but for small samples, it becomes unreliable
What happens if the expected frequencies in a χ2 test are too low (small sample)?
The sampling distribution of the test statistic deviates too much from the chi-square distribution, making the test inaccurate
What is the Fisher’s test?
Calculates exact χ2 for small samples (one of the cells’ expected frequency less than 5)
- uses 2x2 contingecy table (i.e., two categorical variables each with two categories)
Can Fisher’s test be used for larger samples or tables?
Yes, but it’s unnecessary and can be computationally intensive
What is an alternative to Pearson’s χ2?
Likelihood ratio statistic which is based on maximum likelihood theory
How do we compute the likelhood ratio statistic?
Johnny had this in his slides but skipped it. The book talked very little about it
- Collect data and create a model for which the probability of obtaining the observed set of data is maximized
- Compare this model to the probability of obtaining those data under the null
- The resulting statistic is based on comparing observed frequencies with those predicted by the model
- The statistic has a χ2 distribution with df computed the same way as for Pearson’s
- The rest of the procedure to obtain p-value is the same
Formula is in picture 8, i and j are the rows and columns of the contingency table and ln is the natural logarithm