RMB: CHI-SQUARE WEEK 4 Flashcards
When to use chi-square
- You want to know about differences between groups
a. Are there more females than males studying Psychology? - You want to know whether there is an association between two categorical variables
a. Is there an association between gender and smoking? - Your groups are independent – each observation only contributes to once cell of the analysis (e.g. gender).
- You have categorical data aka nominal level of measurement
a. For example, frequency or number of observations in a number of categories – e.g. 10 male versus 50 females enrolled in Psychology. - Your data violates parametric assumptions – i.e. your data is not normally distributed
Chi-Square: Test for goodness of fit x²
- Used on unrelated data > each ppt has data for only one category
- Can take things which are not categorical but make it categorical such as anxiety by using a cut-off on a scale to class people as anxious or not anxious which is categorical
- This test is used to answer questions about the proportions of a population distribution (e.g. gender bias in the psychology department)
- Used to compare different levels of one variable
- Compares sample proportions to the population portions depending on the kind of null hyp
Example of Chi-Square: test for GI
- E.G. if we investigate gender bias in the psych department, we can ask everyone if they are male or female (nominal/categorical data)
- To find X², we need to find the observed frequency and the expected frequency > The observed frequencies are the actual numbers of participants measured in individual categories
- The observed frequency will then be compared to the expected/predicted frequency which was made in the null hyp
- How to work out expected frequency? The exact form of the Fe changes depending on the null hyp
- 2 kinds of null hyp: There will be no difference between specified categories (no difference in amount of men and women) or no difference between the frequency distribution for the observed categories and an existing population (proportion of men to women in computing department reflects the gender balance in the whole university)
- For the no difference null hyp, the no. of people would be predicted to be equal for both categories (e.g. n=50, 25 men and 25 women)
- For the comparison to population hyp, the no. of people would be predicted by the distribution of frequency for the whole university > In the example, we have gotten the population distribution of the people and found 45% of students were men while 55% were women > using this, lets say we are testing 50 ppt, we would work out 45% of 50 = 50 expected freq of men and 55% of 50 = 50 expected freq of women
- Need to compare observed and expected Fe to establish if what is observed is likely to happen in other places
- Chi-square looks at difference between expected value and observed values > bigger the chi-square, the more significant the value is
How to see if GI x² statistic is significant
- To see if the result you found is actually significant, you need to find the degrees of freedom (df). > to find df you need to do C-1 (C refers to the amount of categories you have, so men and women are 2 categories so we do 2-1=1 so df=1)
- We then need to use our df=1 to look up on the chi-square table if our figure is significant
- The table shows for a X² test with a df=1, the critical value at p < 0.05/5% level is 3.84 (P has to be greater than 3.84 which is 0.05 level to be significant)
- Our value is 4.55 which is greater than 3.84 so it is significantly different + can reject null hyp
- Concluding statement: There are significantly more men in the computing department than would be expected by chance X² (1) = 4.55 p < 0.05 + reject the null hyp
Chi-Square: Test of Independence x²
-Are two separate variables independent or associated? E.g. does gender influence smoking?
- TI uses data in the form of frequencies from different categories which is compared to expected frequencies predicted from the null hyp
-E.g. if you are a male smoker, male non-smoker, female smoker or female non-smoker
-Null hyp for TI suggests that the 2 variables being measured are completely independent of eachother + unrelated > e.g. no relationship between gender and smoking behaviour
- Alternative hyp would be there is an association > e.g. there is a relationship between gender + smoking
- Data is presented in a table showing all separate categories which are the observed frequencies
- Similarly to GF test compares observed frequency to expected frequency
- Fe is obtained by using total no. of observations for each variable (Ho predicts smoking behaviour = same)
Calculating expected frequency:
- Fe = Fc x Fr/n > Fe (expected frequency) = Fc (column total) x Fr (row total)/n (total sample)
Reporting chi-square in APA
- For the results to be significant, the P value produced by SPSS should be less than 0.05 > P value = probability of getting results at least as extreme as the observed result > smaller P value = greater likeliness of being significant.
- e.g. Therefore, there is no significant difference between males and females smoking behaviour X² (1) = 1 0.742, p= .389 (SPSS output > p value has to be at least 0.5 or less)