chi - squared Flashcards
What is a contingency table?
A table that summarizes data by counting occurrences in different categories of one or more discrete variables.
What is a chi-square (𝜒²) test used for?
It is used to compare observed counts to expected counts to determine whether a categorical variable follows a certain distribution.
What are the two types of chi-square tests?
Chi-square goodness-of-fit test (for one-way tables)
Chi-square test of independence (for two-way tables)
What is the null hypothesis (𝐻0) for a chi-square goodness-of-fit test?
The observed distribution of a categorical variable matches the expected distribution.
What is the formula for calculating expected frequencies in a chi-square test?
Expectedvalue = Totalsamplesize × Expectedproportion
How do you calculate the chi-square (𝜒²) test statistic?
𝜒²stat = sum of [(O - E)² / E]
where 𝑂 is the observed frequency and 𝐸 is the expected frequency.
What does a large chi-square test statistic indicate?
Stronger evidence against the null hypothesis, suggesting that the observed and expected distributions differ significantly.
How do you determine the degrees of freedom (𝑑𝑓) for a chi-square goodness-of-fit test?
df=numberofcategories−1
What is the interpretation of a chi-square test result?
If the test statistic is large and falls in the critical region, we reject 𝐻0 and conclude the distributions are significantly different.
What does the reference chi-square distribution depend on?
The degrees of freedom (df), which affects the shape of the distribution.
What does a very large chi-square test statistic suggest?
It suggests a significant difference between observed and expected values, leading to rejection of the null hypothesis.
What command in R is used to perform a chi-square goodness-of-fit test?
chisq.test(x=Frequency, p=CensusProp)
How do you find the expected values in R after performing a chi-square test?
salsusTest$expected
What R function provides the critical value for a chi-square test at a 5% significance level?
qchisq(p=0.05, df=4, lower.tail=FALSE)
What is the purpose of a chi-square test of independence?
To determine whether two categorical variables (e.g., gender and political support) are independent.
What are the null and alternative hypotheses in a chi-square test of independence?
H0: The two categorical variables are independent.
𝐻1 : The two categorical variables are not independent (i.e., there is an association).
How do you calculate expected values in a chi-square test of independence?
Expectedvalue = (Rowtotal×Columntotal)/ grand total
How do you calculate degrees of freedom for a chi-square test of independence?
df=(numberofrows−1)×(numberofcolumns−1)
How do you structure data for a chi-square test of independence in R?
Convert the data into a matrix using:
voters.mat <- matrix(voters, nrow=2, ncol=3, byrow=TRUE)
What R function is used to perform a chi-square test of independence?
chisq.test(x=voters.mat)
Why is a correction applied by default for 2 × 2 tables in the R chisq.test() function?
To account for the approximation of a discrete distribution by a continuous 𝜒2 distribution, making the test statistic smaller and the p-value larger.
What are the key assumptions for a valid chi-square test?
Data must be from a random sample.
Each expected cell count should be greater than 1.
At least 80% of expected counts should be 5 or more.
What can be done if the chi-square test assumptions are violated due to small sample sizes?
Combine categories in a sensible way to increase expected cell counts.
What are the two main types of chi-square tests?
Chi-square goodness-of-fit test – Checks if observed data fit a known distribution.
Chi-square test of independence – Checks if two categorical variables are associated.
What is the general approach to hypothesis testing, including chi-square tests?
State the null and alternative hypotheses.
Calculate the test statistic.
Compare the test statistic to a reference distribution.
Draw conclusions based on the p-value.