3.1 Investigating categorical Data Flashcards
Dealing with categorical data
Many characteristics of organisms are present which cannot be measured to give number. Instead, characteristics can only be put into categories.
These can be quantified by counting the frequency with which each character state turns up.
- ) Are the frequencies of particular characteristics different from what we might expect?
- ) Are certain characteristics associated with one another?
The problem of sampling
It is rare that the same proportion of the different characteristics will be picked out.
Example:
sample consisting of half female, half male, the number of females picked will vary.
This is why statistical tests need to be carried out in order to determine if differences or associations are real or could easily have happened by chance.
The test used is the X2 (Chi-squared) test.
The Chi Squared Test from differences
This is to test whether character frequencies are different from expected values.
Example:
Do rats turn more towards or away from a stimulus in a maze? (Expected value towards: away = 50:50)
Rationale:
The Chi squared statistics needs to be worked out
X2 = (O - E)2 / E
O = observed value
E = expected value
The further the values are away from the expected ones the bigger X2 is and the less likely the results are to have happened by chance.
Carrying out a test
1.) construct a null hypothesis
- )Calculate the value of X2
- > (O - E)2 / E
3.) Compare X2 with the critical value for 5 % significance with N - 1 degrees of freedom
N = number of groups
4.) If X2 > X2 therefore there is a greater than 5 % probability of this happening by chance
=> no evidence to reject null hypothesis.
If X2 > X2 therefore here is a less than 5 % probability of this happening by chance
=> evidence to reject null hypothesis
=> frequencies significantly different from expected values
X2 for differences: an example
1.) null hypothesis - no preference (50:50) pressure
2.) Observed / Expected
Left 69 50 = 19
Right 31 50 =-19
19 squared / 50 + (-19)2 / 50 = 7.22 + 7.22 = 14.44
3.) Two groups - degrees of freedom is N - 1 = 1
According to a statistical table X2 must be greater than 3.84 for the difference to be significant.
4.) 14.44 > 3.84 so the preference was significant ; rats moved more towards the left than expected.
Warnings about using X2 Tests
1.) Never use percentage, but always use the actual frequencies.
This is due to bigger samples being less likely to differ by the same proportion from the expected value.
- ) Bigger sample sizes allow for observation of smaller differences. This is due to the squared term on top.
- ) There is another test, the X2 test for associations
The X2 test for associations
To test whether two sets of character states are associated.
The test determines whether the distribution of character states is different from what it would be if they were randomly distributed around the population.
Carrying out a test
1.) construct a null hypothesis
2.) Calculate the value of X2
-> more complex process
=> calculate the expected frequencies by first arranging the data in a contingency table
E = (Column total x Row total) / Grand total
and put results into the table
Finally, calculate X2 = (O - E)2 / E