Chapter 25: the Chi-square test Flashcards
the problem of multiple comparisons
The problem of how to do many comparisons at once with an overall measure of confidence in all our conclusions
expected count
To test H0, we compare the observed counts in the table with the expected counts, the counts we would expect—except for random variation—if H0 were true.
the chi-square statistic
To test whether the observed differences among the four distributions of living arrangements given age are statistically significant, we compare the observed and expected counts. The chi-square statistic is a measure of how far the observed counts in a two-way table are from the expected counts if H0 were true. The Chi-square statistic is a sum of terms, one for each cell in the table. Think of χ2 as a measure of the distance of the observed counts from the expected counts if H0 were true. Large values of χ2 are evidence against H0 because they say that the observed counts are far from what we would expect if H0 were true.
To find the P-value for the χ2 statistic, we must know…
…the sampling distribution of the statistic when the null hypothesis (no relationship between the two categorical variables) is true.
The Chi-Square Distributions
The Chi-square distributions are a family of distributions that take only positive values and are skewed to the right. A specific Chi-square distribution is specified by giving its degrees of freedom.
You find the degrees of freedom by:
(# of rows-1)(# of columns)
the mean of any Chi-square distribution is equal to…
…its degrees of freedom
Cell Counts Required for the Chi-Square Test
You can safely use the Chi-square test with critical values from the chi-square distribution when no more than 20% of the expected counts are less than 5, and all individual expected counts are 1 or greater. In particular, all four expected counts in a 2 × 2 table should be 5 or greater. Note that these guidelines use EXPECTED counts.
chi-square test for homogeneity
used when we are interested in whether or not the populations from which the samples are selected are homogeneous (the same) with respect to the single classification variable