Categorical Data Analysis Flashcards
Chi-squared assumptions
- All expected values Ei,j are greater than the value of 1.
- No more than 20% of the expected values Ei,j are less than 5.
- Cell values are independent.
test for a 2x2 table
Chi-squared or Fisher’s
Test for 2x3+ table
Chi-squared test for trend
Test for paired data
McNemar’s
Test for a lurking/stratifying variable
Cochran Mantel Haenszel Test
When do you use a Fisher’s exact test?
when the assumptions of a Chi-squared are violated
How must data be formatted for Fisher’s exact?
must be a 2x2 table
data may need to be combined
Fisher’s hypotheses
H0: There is no association between the variables (independent)
H1: There is an association between the variables (dependent)
What is risk (and how to calculate)?
> the probability that an event will occur
number of events/total population at risk
risk ratio or relative risk calculation
p(event in group 1)/ p(event in group 2)
odds definition and calculation
odds is the ratio of an event happening to not happening
odds = p / (1-p)
where p is the probability of an event
what is an odds ratio?
a measure of association between an exposure and an outcome
odds ratio calc
odds of event in exposed group/odds of event in non-exposed group
odds ratio interpretation
the exposed have x times the odds of the event occurring than the non-exposed
odds ratio table set up
event along the top (columns)
exposure along side (rows)
yes + yes in top left
when is relative risk typically used?
cohort studies
when is the odds ration typically used?
case-control studies
when should odds ratios be avoided?
if a disease is common, odds ratio will overestimate the risk
when to use chi squared test for trend?
when there is one nominal categorical variable and one ordinal categorical variable with at least 3 levels.
Chi-squared test for trend hypothesis
H0: there is no linear trend in the relationship between variable x and variable y
what are the degrees of freedom in a Chi-squared test for trend?
always 1
how to interpret Chi-squared test for trend?
look at counts in the table to determine the nature of relationship; test only tells you if a relationship is present
McNemar’s null hypothesis
H0 = there is no difference in the number of individuals between the first and second occasions
what are the McNemar’s test assumptions?
- data must be paired
- response variable must be binary
- number of discordant pairs must be large, ideally b + c > 10
- each observation should correspond to a unique individual or a matched pair