Categorical Data Analysis Flashcards

1
Q

Chi-squared assumptions

A
  • All expected values Ei,j are greater than the value of 1.
    • No more than 20% of the expected values Ei,j are less than 5.
    • Cell values are independent.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

test for a 2x2 table

A

Chi-squared or Fisher’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Test for 2x3+ table

A

Chi-squared test for trend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Test for paired data

A

McNemar’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Test for a lurking/stratifying variable

A

Cochran Mantel Haenszel Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When do you use a Fisher’s exact test?

A

when the assumptions of a Chi-squared are violated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How must data be formatted for Fisher’s exact?

A

must be a 2x2 table
data may need to be combined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Fisher’s hypotheses

A

H0: There is no association between the variables (independent)
H1: There is an association between the variables (dependent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is risk (and how to calculate)?

A

> the probability that an event will occur
number of events/total population at risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

risk ratio or relative risk calculation

A

p(event in group 1)/ p(event in group 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

odds definition and calculation

A

odds is the ratio of an event happening to not happening
odds = p / (1-p)
where p is the probability of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is an odds ratio?

A

a measure of association between an exposure and an outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

odds ratio calc

A

odds of event in exposed group/odds of event in non-exposed group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

odds ratio interpretation

A

the exposed have x times the odds of the event occurring than the non-exposed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

odds ratio table set up

A

event along the top (columns)
exposure along side (rows)
yes + yes in top left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

when is relative risk typically used?

A

cohort studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

when is the odds ration typically used?

A

case-control studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

when should odds ratios be avoided?

A

if a disease is common, odds ratio will overestimate the risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

when to use chi squared test for trend?

A

when there is one nominal categorical variable and one ordinal categorical variable with at least 3 levels.

20
Q

Chi-squared test for trend hypothesis

A

H0: there is no linear trend in the relationship between variable x and variable y

21
Q

what are the degrees of freedom in a Chi-squared test for trend?

22
Q

how to interpret Chi-squared test for trend?

A

look at counts in the table to determine the nature of relationship; test only tells you if a relationship is present

23
Q

McNemar’s null hypothesis

A

H0 = there is no difference in the number of individuals between the first and second occasions

24
Q

what are the McNemar’s test assumptions?

A
  1. data must be paired
  2. response variable must be binary
  3. number of discordant pairs must be large, ideally b + c > 10
  4. each observation should correspond to a unique individual or a matched pair
25
what are discordant pairs in McNemar's?
> participants whose outcome changes between readings > outcome has worsened or gotten better > these are the observed frequencies
26
expected frequencies in McNemar's?
(b+c) / 2 > where b and c are discordant pairs
27
When is Cochran Mantel Haenszel Test used (data type)?
used to assess the association between two categorical variables while controlling for a third stratifying variable
28
What is a confounding variable?
> a third variable in a study examining a potential cause-and-effect relationship For a variable to be a confounder, it must: 1) associated with the exposure (i.e., it differs between the exposed and unexposed groups). 2) Be associated with the outcome (i.e., it influences the outcome). 3) Not be in the causal pathway.
29
what is a stratifying variable?
a variable used to divide data into subgroups (strata) to analyse the exposure-outcome relationship within each subgroup.
30
What is the CMH test null hypothesis?
H0 = There is no association between variable1 and variable2 after adjusting for the stratifying variable, variable3.
31
What are the CMH test assumptions?
>applied to a series of 2x2 tables > each expected cell count should be at least 5 > observations are independent > if 0 values are present at all, it is not reliable
32
CMH steps
1. Run a chi- squared/Fisher's to first determine if there is a relationship between the 2 variables (state hypotheses) 2. Then carry out CMH to determine if there is lurking or stratifying variable (restate hypotheses) 3. in conclusion, state whether stratifying variable is lurking or not
33
When do you use a Kappa test?
a statistical measure used to assess the agreement between two raters (or observers) when they are classifying items into categories.
34
what makes a Kappa test different?
Its not a hypothesis test; there are no p-values
35
Cohen's Kappa interpretation
>maximum value of 1 when agreement is perfect a) <0.2 = poor b) 0.2-0.4 = fair c) 0.4-0.6 = moderate d) 0.6-0.8 = good e) 0.8-1 = very good > give interpretation for both weighted and unweighted Kappa.
36
How to know if Cohen's Kappa is incorrect?
> weighted Kappa should be larger than unweighted Kappa because it accounts for the degree of disagreement rather than treating all disagreements equally
37
What must be done in R to calculate Kappa?
> weighted Kappa matrix must be created > has 1s on the main diagonal >top row runs from 1 > 0
38
Calculating Kappa matrix values
>take number of categories (columns) and minus 1 >take reciprocal of this ( 4 = 1/4) >move in increments of 1/4 between 1 and 0
39
what are the limitations of Kappa test?
1) prevalence problem = if one category is much more common, kappa can be misleadingly low even if agreement is high 2) more categories lower kappa, fewer categories inflate kappa
40
How do you combine factor levels in a table? (for Fisher's)
1) data$variable <- as.factor 2) levels(data$variable) 3) data$variable <- factor(data$variable, labels = c())
41
How do you reorder factor levels?
1) data$variable <- as.factor 2) levels(data$variable) 3) data$variable <- factor(data$variable, levels = c())
42
How to interpret relative risk of less than 1?
1) i.e rr = 0.6 2) exposed individuals have 40% less risk of event 3) if rr < 1, 1 - rr for interpretation
43
How to make a table in r with no dataset
matrix( c( n, n, n, n,), ncol=2, byrow = true)
44
what is the key difference between fisher's exact and chi squared?
Fisher’s Exact Test is conservative, sometimes leading to higher p-values than chi-squared
45
When is the odds ratio (OR) preferred over relative risk (RR)?
when the event is rare
46