Chpt 11 - Chi-Square Tests Flashcards
What test can be used to find out if a die is claimed to be unfair?
Chi squared test
What is the chi square distribution
Special type of right-skewed curve which depends on its degrees of freedom
Starts at 0 on the horizontal axis and extends indefinitely to the right, approaching, but never touching, the horizontal axis
What is the total area under the chi squared curve?
Equal to 1
What are the effects of degrees of freedom on a chi-squared curve?
The larger the degrees of freedom, the more the X2 curve looks like normal curves
What is the value of X2α?
The area of α to its right under the chi-square curve
Which table do we use to find the X2α value?
Table VII
What is the formula to determine the expected frequency for a chi-squared test?
E = np
E is the expected frequency
n is the sample size
p is the probability specified by Ho
What is the test statistic for a chi-squared test?
X2 = Σ(O-E)squared/E
O is observed frequency
E is expected frequency
What are the steps to a goodness-of-fit test?
- Set up the hypotheses
- Check the assumptions
- Decide significance level and find critical value
- Calculate the test statistic
- Compare the test statistic with critical value
- Interpret the result in the context of the question
What are the assumptions that must be checked for a chi-square goodness-of-fit test?
All expected frequencies are at least 1
At most 20% of the expected frequencies are less than 5
Simple random sample
What are the basics for the hypotheses for a chi-square goodness-of-fit test?
Ho: The variable has the specified distribution
Ha: The variable does not have the specified distribution
How do we determine degrees of freedom for a chi-squared goodness-of-fit test?
c-1
the number of categories - 1
If we are using the P-value to solve a chi-squared goodness-of-fit test, how do we determine if we reject the Ho?
If:
α > p value -> we reject Ho
α < p value -> we DO NOT reject ho
What is the rejection region of a chi-squared test and how do we determine if we reject Ho?
The critical value is X2α with df=C-1
The rejection region is the area to the right of the critical value
If the test statistic is larger than the X2a value, we reject Ho
If the test statistic is smaller than the X2a value, we DO NOT reject Ho
A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.
Set up the hypotheses
Ho: The distribution of the outcome of rolling this die is P(X=x) = 1/6, x = 1, 2, 3, 4, 5, 6
Ha: the distribution is not the one as shown above
A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.
Determine the significance level and critical value
α = 5% = 0.05
df = 6 category (one for each die) -1 = 5
Z2α = 11.070
A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.
What are the expected outcomes?
E = np = 1200 x 1/6 = 200
Each side (1, 2, 3, 4, 5, 6) all have the same expected outcome because we expect Ho to be true, so all sides should be equal
A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.
Check the assumptions
simple random sample ✓
all expected frequencies are at least 1 ✓
at most 20% of the expected frequencies are less than 5 ✓
(all expected outcomes should be 200)
A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.
If the observed frequency of 3 was 183, what is the statistic for this line?
How do we determine the test statistic?
(O-E )squared/E
(183-200) squared/200 = 1.445
The test statistic is the sum of this value for each category (so the dice sides 1-6)
A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.
Compare:
Test statistic 11.38
Critical value 11.070
The test statistic value is greater than the critical value, so we reject Ho
A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.
Interpret
The test statistic value is greater than the critical value, so we reject Ho
At the 5% significance level, the data provides sufficient evidence that the die is unfair
A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.
Compare:
P(X2 < 11.38) = 0.9557
The p-value given by the software is the area to the left, for chi-squared goodness-of-fit test, we need the area to the right so:
1-0.9557 = 0.0444
α = 0.05
α > p value so we reject Ho
The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.
Set up the hypotheses
Ho: distributions of blood type is
P(O) = 0.46
P(A) = 0.42
P(B) = 0.09
P(AB) = 0.03
Ha: the distribution is not the one as shown above
The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.
Check assumptions
simple random sample ✓
all expected frequencies are at least 1 ✓
at most 20% of the expected frequencies are less than 5 ✓
(the smallest expected outcome is 3% of 200=6)
The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.
Find critical value
α = 1% = 0.01
degrees of freedom = number of blood types - 1 = 4-1 = 3
Critical value is 11.345
The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.
If the number of people with A blood was actually 76 in the community, what is the calculation for this line?
How do we use this information to determine the test statistic?
Expected: np = 200*0.42 = 84
(O-E)squared/E
= (76-84) squared / 84
= 0.7619
Test statistic is the sum of this equation for each category
The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.
Compare:
Test statistic 3.7559
critical value 11.345
The test statistic is less than the critical value, we DO NOT reject Ho
The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.
Interpret:
The test statistic is less than the critical value, we DO NOT reject Ho
At the 1% significance level, the data does not provide sufficient evidence that the proportions in this community differ from those in the general population
The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.
Compare:
P value is P(X2 < 3.7558) = 0.7109
The value given is the area to the left and we need the area to the left so:
1-0.7109 = 0.2891
α = 0.01
α < p value so we DO NOT reject Ho
What are the 6 steps of a chi-square independence test?
- Set up the hypotheses
- Check the assumptions
- Decide significance level and find critical value
- Calculate the test statistic
- Compare the test statistic with critical value
- Interpret the result in the context of the question
What are the assumptions for a chi-square independence test?
All expected frequencies are at least 1
At most 20% of the expected frequencies are less than 5
Simple random sample
What are the basics for the hypotheses for a chi-square goodness-of-fit test?
Ho: the two variables are not associated (independent)
Ha: the two variables are associated (not independent)
How are expected frequencies determined for chi-independence tests?
E = RC/n
E - expected frequencies
R - Row frequency
C - column frequency
n - sample size
How is a test statistic calculated for a chi-independence test?
X2 = Σ(O-E)squared/E
O is observed frequency
E is expected frequency
Degrees of freedom = (r-1)(c-1)
r - number of row variables
c - number of column variables
As a reminder:
E = RC/n
E - expected frequencies
R - Row frequency
C - column frequency
n - sample size
We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated.
Set up the hypotheses
Ho: gender and age are not associated (independent)
Ha: gender and age are associated (not independent)
We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated.
Check the assumptions
To do this you would have to set up the table to determine expected frequencies first, and when we do, all are over 5, so:
simple random sample ✓
all expected frequencies are at least 1 ✓
at most 20% of the expected frequencies are less than 5 ✓
We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated. We are going to group the ages into 20-34, 35-44, and 45-64
Determine the critical value
α = 5% = 0.05
df = (r-1)(c-1) = (2-1)(3-1) = 1*2 = 2
Critical value is 5.991
We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated. We are going to group the ages into 20-34, 35-44, and 45-64.
Determine the statistic for this line:
————–20-34—-Total
Women—2768—-9445
Total——-5166—-17890
How do we use this to determine the test statistic for the chi independence test?
Expected
= RC/n
= (9445*5166)/17890
= 2727.382
(O-E)squared/E
= (2768-2727.382)squared/2727.382
= 0.6049
For the test statistic, we take the sum of all of the above values
We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated. We are going to group the ages into 20-34, 35-44, and 45-64.
Compare
Test statistic is 8.4567
critical value is 5.991
Test statistic is larger than the critical value, so we reject Ho
We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated. We are going to group the ages into 20-34, 35-44, and 45-64.
Interpret:
Test statistic is larger than the critical value, so we reject Ho
At the 5% significance level, the data provides sufficient evidence that the two variables of gender and age are associated
We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated. We are going to group the ages into 20-34, 35-44, and 45-64.
Compare
P-value is P(X2, df=2 < 0.9854)
The value given is to the left and we need the area to the right so
1-0.9854 = 0.0146
α = 0.05
α > p value so we reject Ho