Chi-square tests Flashcards
Chi-square test of association and goodness of fit, binomial and sign tests
What are hypotheses?
Testable statements (not questions) which predict a relationship between variables Variables need to be named precisely and consistently
What are two assumptions made by parametric tests, and what happens when these are violated?
Assume normal distribution and at least interval level data
Violation leads to erroneous interpretation of data
When would you use a non-parametric test?
When you don’t have a normal distribution, and when you have categorical (nominal/ordinal) data
Why should non-parametric tests be used with care?
Not as powerful as parametric tests and can fail to detect some differences especially when sample size is low. Need to ensure large sample to detect smaller effects and demonstrate significance
What type of test is chi-square? What assumptions are made when using it?
Non-parametric
Categorical data (coded as numbers), frequencies in each category
Assumes categories mutually exclusive and independent i.e. participants must only be in one
Expected frequencies must be >5 in each cell of contingency table
How can the chi-square test of association/independence be used?
To investigate whether two variables are associated e.g. does gender influence smoking frequency?
Compare observed frequencies to expected frequencies predicted from the null hypothesis
What is the calculation used to find expected frequencies?
Expected frequency = (column frequency total x row frequency total)/total sample size
What is the formula for the chi-square calculation?
chi-square = SUM(((observed frequencies - expected frequencies) squared)/expected frequencies)
A bigger chi-square value represents greater divergence from the null i.e. stronger association
In order to assess significance of our chi-square value we need two things: our degrees of freedom and our alpha value, and we can use these to find the CRITICAL VALUE which our obtained value needs to exceed to be significant. How do you calculate degrees of freedom for association tests?
df = (R-1) x (C-1) where R = rows and C = columns
When would we use the Chi-Square Goodness of Fit test?
On UNRELATED data i.e. where every participant yields data for one single category and we are comparing different levels of ONE VARIABLE
Can be used to compare PROPORTIONS of a population distribution e.g. if there is a gender bias in the computing department
We essentially want to see whether data depart significantly from a theoretical distribution, often one that has more than just two theoretical values i.e. not just 50:50
How do we calculate the goodness of fit?
We once again use observed and expected frequencies - observed is the number of participants measured in individual categories e.g. gender categories, while expected is the frequencies predicted by the null hypothesis (these can vary depending on specific null i.e. are we finding whether the genders are equal, in which case expected would be 50:50, or are we interested in whether the ratio in the one department is representative of the population ratio, in which case we would have to know population proportions and multiply the sample group by those proportions)
What is the Goodness of Fit formula?
Same as for association
Why do we divide by expected frequency?
So chi-square value is not influenced by variation in the size of the expected frequencies
How do we calculate the degrees of freedom for goodness of fit tests?
df = C -1 where C is the number of categories
In a goodness of fit test, what happens if we have more than 2 categories?
We cannot determine exactly which group difference meant that the chi-square value was significant and thus we simply say that obtained frequencies differed from those expected by the null hypothesis