W3: Tests of Association Flashcards
Another possibility is that response we are interested in is in the form of a count, examples of count variables are (2)
- number of students with graduate jobs
- number of individuals passing a test
One question of interest is if the counts change (Y) based on
one or more binary covariates (X)
We are interested whether there is an association between
a binary factor and their response –> not regression
Suppose we are interested in association of binary factor and response, summarise in table below:
If there is no association between binary factor (e.g., degree = psych, enginerring) and response (yes/no to having a grad job) then expect
proportion of a/g to be similar to the proportion b/h
Formally test the contigency table of whether proportion are similar or not using
Chi-squared tets for independence
The null hypothesis for this test is that
There is no association between the factor and response
Alternative hypothesis is that
There is an association between factor and response
Example of null and alternate hypothesis (2) using this table:
Chi squared
HO: No association between region and access to services
H1: Association between region and access to services
Calculate expected frequencies for each cell of the contigency table assuming no association (H0 is true) using formula:
Whats column and whats row?
Chi squared
How to calculate overall total of a table? - for example in this table?
10 + 5 + 2 +27 + 25 + 8 = 77
How to note down E while doing it for each cell?
ETL
EBL
EBR
ETR
EMR
T = top
B = Bottom
L = Left
R = Right
M = Middle
After hypothesis and expected frequencies, we calculate the test statistic
Formula is:
We do it for each cell in table for observed (O) then expected frequencies we just calculated and add it together
Chi-squared test formula still holds the same when response and/or factorrs have
more than two levels
After calculating x^2 chi-squared statistic, we calculate degrees of freedoom by:
R = no. of rows in contigency actual table
C = no. of columns in contigency actual table.
Calculate DF for this table
Chi squared
r = 3
c = 2
(3 - 1) * (2 - 1) = 2 * 1 = 2
For tables with 2 rows and 2 columns we use the
Chi squared
x2/1 disturbition
The DF calculated corresponds to what p value you look at:
If x2/1 disturbition then look at
Chi squared
What do we say before calculating expected frequencies?
Chi squared
We calculate the expected frequencies
What do we say before stating hypotheses
Chi squared
We test:
List HO/H1
What do we say before calculating test statistic
We calculate the test statistic as
After calculating test statistic and then calculating DF we say:
Chi squared
We compare this x^2 with the x2/1 disturbition
What table of critical sig levels do we list to compare x^2 value?
Chi square
10%
5%
1%
0.1%
If x^2 larger than 0.1% value then
p <0.001
Example of a conclusion of chi-squared test being significant at 0.1% level
We see p < 0.001 and so we reject the H0 at the 0.1% level. We have very strong evidence of an association between personality and degree.
List of concluding statements - p-value greater than 10%
The p-value is greater than 10% so there is no evidence against the null hypothesis H0 and we do no reject it.
List of concluding statements - p-value less than 10%
The p-value is less than 10% but greater than 5% so there may be slight evidence against the null hypothesis H0 but we do no reject it.
List of concluding statements - p-value less 5%
he p-value is less than 5% but greater than 1% so there is moderate evidence against the null hypothesis H0 and we reject it and accept the alternate hypothesis H1.
List of concluding statements - p-value less 1%
The p-value is less than 1% but greater than 0.1% so there is strong evidence against the null hypothesis H0 and we reject it and accept the alternate hypothesis H1.
List of concluding statements - p-value less 0.1% level
The p-value is less than 0.1% so there is very strong evidence against the null hypothesis H0 and we reject it and accept the alternate hypothesis H1.
Assumptions of Chi-squared test - (2)
- We require independent observations
- For 2x2 table, all expected frequencies must be larger than 5.
For a larger table, no more than 20% of all cells may have an expected frequency less than 5 and all expected frequencies must be larger than 1
In assumptions of chi-squared , we should always check the
second assumption
An alternative test to Chi-squared test is
Fisher’s exact test
Why is Fisher’s exact test a suitable alternative test to chi squared
This is based on an exact p-value rather than asymptotic properties
of the test statistic, and so is approriate even when there are small
counts in some cells of the contingency table.
When is Fisher’s exact test utilised? - (2)
The Chi-squared test is exact asymptotically, which means that it is more
and more accurate as the sample size gets bigger.
This is why, if any of the expected frequencies are less than 5, we should
not use it, as the sample size is not large enough for the test to be
accurate.
Chi-squared 2x2 contiegency table in SPSS
Interpret same as before - seeing difference in proportion between the rows
Chi-squared test statistics, DF and p-value
Pearson Chi-squared = 0.024
DF = 1
P -value = 0.877
Checking chi-squared test if all expected frequencies are above five in SPSS
Little note that 0 cells have expected counts less than 5
So all assumptin are satisfied so use results of Chi-squared test
Fisher’s exact test gives same 2x2 contiegency table as well as data on DF and p-value:
DF 1 since exact-sig (2-sided)
P-value is 0.534
2X2 contigency table in R
Chi-squared test statistic, DF, p-value in R
Chi-squared test expected frequencies output
Fisher’s exact test results in R
P value = 1 (not sig)