Simple tests for analysing and comparing categorical data Flashcards
what are the types of data
2 types Qualitative (Categorical) Nominal (no natural ordering) Haemoglobin types Sex
Ordered categorical
Anaemic / borderline / not anaemic
Grades of breast cancer
and Quantitative (numerical Discrete (can only take certain values) Number of positive tests for anaemia Number of children in a family
Continuous (limited only by accuracy of instrument)
Haemoglobin concentration (g/dl)
Height
what are the steps of the Difference in proportions:hypothesis test
The hypothesis test assumes that there is a common proportion, (pie), estimated by p:
P= (n1p1 +n2p2)/ (n1+n2)
And the standard error for the difference in proportions is estimated by:
SE=
p(1-p)[1/n1 +1/n2]- SQRT answer
From this we can compute the test statistic z:
z = (p1 - p2) / SE(p1-p2)
We can then compare this value to what would be expected under the null hypothesis of no difference, in order to get a P-value
Difference in proportions: Confidence interval
Find SE first
SE= p1(1-p1)/n1 + p2(1-p2)/n2- SQRT
(p1-p2)- diffren in p
(p1-p2) -+ [1.96xSE]
when do you use the chi-squared test
Two unordered categorical variables that form a r x c contingency table.
At least 80% of expected cell counts >5.
All expected cell counts 1.
what is he equation in the chi test that helps cal the the expected frequency:
EF= row total x column total/ N
EG 2X2 TABLE-
toltoal healed 39 x totla in colum- 120/ overall total
what are the steps for the chi test
For each cell in the table calculate the difference between the observed value and the expected value.
Square each difference and divide the resultant quantity by the expected value.
Sum all of these to get a single number, the χ2 (Chi-squared) statistic.
Compare this number with tables of the chi-squared distribution with the following degrees of freedom:
(no. of rows - 1) x (no. of columns -1)
Look at notes and slides for equations
Use in a 2x2 table
what is the euqartion for chi stat
O-observed value- E-expected value
(o1-e1)2/e1 +(o2-e2)2/e2 + (o3-e3)2/e3+ (o4-e4)2/e4
when is it not correct to use the chi 2 test
If more than 20% of expected cell counts are less than 5 then the test statistic does not approximate a chi-squared distribution.
If any expected cell counts are <1 then we cannot use the chi-squared distribution.
In large tables we may have to combine categories to make bigger numbers (providing it’s meaningful).
when can you use Yates correction
In 2 x 2 tables, even when expected cell counts are bigger than 5, the mathematical approximations are not that great.
We will reject the null hypothesis too often on average.
We can use Yates’ correction.
what is yates equation
(/o-e/ -0.5)2/ e