Reading Quiz 14 Flashcards
three chi-square procedures
- chi-square goodness of fit test
- chi-square test for homogeneity of populations
- chi-square test of association/independence
chi-square test for goodness of fit
tests the null hypothesis that a categorical variable has a specific distribution
aka X^2
chi-square test for homogeneity of populations
tests the null hypothesis that the distribution of a particular categorical variable is the same for all of the populations
chi-square test of association/independence
tests the null hypothesis that there is no relationship between two categorical variables
expected count
the expected count for any variable category is obtained by multiplying the hypothesized proportion of the distribution for each category times the sample size
chi-square statistic
X^2 = ∑ ((observed count - expected count)^2) / expected count aka ∑((O-E)^2)/E
where sum is over k variable categories
chi-square test compares the value of the statistic
X^2 with critical values from the chi-square distribution with k-1 degrees of freedom, where k = the number of categories
Ho and Ha for chi-square GOF
Ho: the population proportions equal the hypothesized values (provide them)
Ha: at least one of the population proportions differs from its hypothesized value
p-value is the
area under the density curve to the right of X^2
large values of X^2 are evidence
against Ho
the chi-square distribution is an approximation to the distribution of
the statistic X^2
can safely use the approximate (aka conditions) when sample is
an SRS from the population and when all expected counts are at least 1 and no more than 20% of all expected counts are less than 5 (state the expected counts!)
if the chi-square test finds a statistically significant p-value, you are technically supposed to do a
follow-up analysis that compares the observed counts with the expected counts and that looks for the largest components of the chi-square statistic
two-way tables
first compute percents or proportions that describe the relationship of interest
then turn to formal inference
two different methods of generating data for two-way tables lead to the
chi-square test for homogeneity of populations and the chi-square test of association/independence
chi-square test for homogeneity of populations
independent SRSs are drawn from each of several populations
each observation is classified according to a categorical variable of interest
null hypothesis is that distribution of categorical variable is same for all of the populations
one common use of the chi-square test for homogeneity of populations is to compare several
population proportions
the null hypothesis is that all of the population proportions are equal
the alternative hypothesis is that they are not all equal but allows any other relationship among the population proportions
chi-square test of association/independence
a single SRS is drawn from a single population
observations are classified according to two categorical variables
null hypothesis is that there is no relationship between the row variable and the column variable
expected count
the expected count in any cell of a two-way table when Ho is true is
expected count = (row total * column total) / n
where n = sample size
chi-square statistic
X^2 = ∑(O-E)^2 /E
where sum is over all r*c cells
the chi-square test compares the value of the statistic X6@
with critical values from the chi-square distribution with (r-1)(c-1) degrees of freedom
r = the number of rows
c= number of columns
p-value is the
area under the density curve to the right of X^2
larger values of X^2 are evidence against Ho
chi-square distribution approximation to the distribution of
the statistic X^2
can safely use this approximation aka the conditions when all expected cell counts
are at least 1 and no more than 20% of all expected cell counts are less than 5
for an independence/association test the sample must be gathered
by an SRS from the population
for homogeneity all of the samples
must be independent SRSs from their respective populations
Suppose that you are dealing with a situation where there are several possible outcomes, not just 2 (success and failure). You are interested in seeing whether the proportion of outcomes falling into each of a certain set of categories is consistent with a certain hypothesized population distribution. What is the name of the test you use?
chi-square test for goodness of fit
Suppose that your hypothesized population distribution for the percent of objects that are certain colors is 20% black, 50% white, and 30% green. Suppose you draw a sample of 200, to test this hypothesis. What are the “expected” values that you use when you do the chi-square goodness of fit test?
40, 100, 60
In testing the hypothesis mentioned in Q2, suppose your observed counts are 45, 90, and 65. What does chi-square equal for this goodness of fit test? Please write a numerical expression without calculating the result.
A. chi-square = (45-40)^2/40 + (90-100)^2/100 + (65-60)^2/60
Is there just one chi-square distribution, or a family of distributions, with one distribution for each number of degrees of freedom?
A. A family, with one distribution for each number of degrees of freedom.
How do you find the number of degrees of freedom for a chi-square goodness of fit test? For example, how many degrees of freedom would there be if you were looking at the proportion of blacks, whites, and greens as in Q2?
A. The degrees of freedom is one less than the number of categories in the distribution; for example, when there are blacks, whites, and greens, the number of degrees of freedom is 3-1=2.
When you look up in a table or a calculator the P-value associated with a certain chi-square, what is that the probability of?
A. The probability of obtaining results as extreme as, or more extreme than, the ones you obtained, if the hypothesized distribution is true. (Extreme means deviant from what is expected.)
Is the chi-square distribution symmetrical? If not, in which direction is it skewed?
skewed to the right
When you are doing a chi-square test for goodness of fit, what are the hypothesis H0 and the alternative hypothesis Ha?
A. The H0 is that the population percents are equal to the set of hypothesized percents. The Ha is that the population percents do not equal that set of hypothesized percents.
What are the rule of thumb conditions for the use of the chi-square goodness of fit test?
A. All individual expected counts are at least 1 and no more than 20% of the expected counts are
less than 5.
If a chi-square goodness of fit test yields a significant result, what should you inspect before you interpret the results?
A. You see which observed counts deviated the most from the expected ones – in other words, you see which cells contributed the most to the chi-square that was calculated. You take these observations into account when interpreting your results.
Two-way tables describe relationships between two categorical or continuous variables?
categorical
When there are multiple comparisons that can be made, what two steps are often carried out?
A. First an overall test for evidence of any differences among the parameters being compared, and then a follow-up analysis to decide which parameters differ and to estimate how large the differences are.
When doing a chi-square test to compare several proportions, the first step is to set up the table with the numbers in it being (proportions of success and number of trials, or counts of the number of cases falling into each category).
A. Counts of the number of cases falling into each category.
When there are two categorical variables being displayed in an r by c table (with r rows and c columns), each of the r x c possible categories into which the observations may fall is called a _____ of the table.
cell
When we are comparing the proportion of successes for three treatment conditions, what null hypothesis would we use?
A. That the proportion of successes is the same among all three conditions, i.e. that p1 = p2 = p3.
When comparing the proportion of successes for three treatment conditions, what would be the alternative hypothesis?
A. That not all the proportions are equal.
In testing Ho via chi-square with a two-way table, we compare the observed counts with the expected counts. Evidence against Ho consists of observed and expected counts that are far from each other or close to each other?
far from each other
How do you compute the expected count in a certain cell of a two-way table?
A. The expected count is the (row total * column total)/table total.
The calculation of the expected value for a cell of a two-way table assumes what relationship between the row and column variables is (disjoint or independent).
independent
When you want to test the statistical significance of the deviation of observed from expected counts, in a two-way table, using chi-square, how do you compute the chi-square statistic?
A. chi-square is the summation of the (observed count - expected count)^2/expected count. The summation is over all r * c cells of the table.
Large values of chi-square are evidence for, or against Ho? Why?
A. Against. This is because chi-square will be bigger, the bigger are the deviations of observed
counts from those that would be expected under Ho.
How many degrees of freedom do you have in a chi-square test with an r * c two-way table?
A. (r-1)(c-1)
True or False: when doing chi-square tests, the p-value is always the area under the distribution curve that is to the right of the observed chi-square, and never the area to the left.
A. True. For the chi-square distribution, the farther you go to the right, the more you have deviated from the null hypothesis. The value most consistent with the null hypothesis is 0, which is the left end of the domain for the function. To get the probability of results as deviant as, or more deviant than, the obtained results, you look at the probability under the curve to the right of the obtained results. (This includes the probability exactly at the obtained results, but since chi-square is a continuous function, the distinction between “above” and “at or above” is not meaningful.)
What cell counts are required for doing a chi-square test for homogeneity of populations?
A. The same as for tests of goodness of fit: all expected counts are 1 or greater, and no more than
20% of the expected counts are less than 5.
In the special case of a two-by-two table (r=2 and c=2), how many cell counts need to be 5 or greater in order to do a chi-square?
all four of them
How many degrees of freedom would be used for a 3 by 2 table?
A. (3-1)*(2-1) = 2
After having done an overall test rejecting the hypothesis that all the proportions are equal, what should be done?
A. A follow-up analysis that asks which cells most contribute to the deviation from expectations under the null hypothesis. You can do this informally by observation; there are more formal methods that do significance tests and confidence intervals for the individual proportions.
True or False: the chi-square tests the hypothesis that “the row and column variables are not related to each other,” even when it is difficult to conceive of the groups defined by the rows and columns as different populations, i.e. when you are dealing with the relation of some variables in one population.
true
True or False: for a chi-square test of association/independence of variables, you compute the expected counts just as in the other situations: the row total * column total/ table total.
true
True or False: converting table entries to percents is not necessary for the computation of chi-square, but it does help to shed light on the association among the variables.
true
For a chi-square test of association/independence of variables, what is the null hypothesis?
A. That the variables are independent, or that there is no association between them.
True or False: the distinction between tests of homogeneity of populations and tests of association/independence is that in the first, there is a sample from each of two or more populations, and in the second, there is a single sample from a single population.
A. True. (However, distinguishing whether there is one or more than one population involved in a study can be a debatable procedure. If you collect a sample of people, some of whom are wealthy and some of whom are poor, can you argue that you have sampled some individuals from the population of poor people and some from the population of rich people? Or have you drawn from one population of people, who simply differ in one variable? Fortunately, the chi-square test is done in the same way regardless of the outcome of such a debate.)
When there is a two-by-two table, and you wish to compare two proportions, how will a two-sided z test for equality of proportions and a chi-square test compare with respect to the p values that result?
A. The same p values will result.
If there is a two-by-two table and you wish to compare two proportions, which test is usually recommended, between a z test and a chi-square, and why?
A. The z test has the advantages that it is related to a confidence interval for the difference in proportions, plus you can do a one-sided test if desired.