Intro to generalised linear models/Categorical Data Flashcards
What is the distribution in each sample characterised by for binary variables?
The proportion of “1”s in each sample
E.g. if the binary outcomes = presence of disease, then the data set is completely characterised by the two group sizes and the relative frequency of the disease in each group
What indices exist for measuring the effect/impact of the group membership (typically Exposure)?
Relative risk (RR)
Odds ratio (OR)
Risk difference – not considered here
What can population indices be estimated (without bias for large) samples) by?
What can also be derived?
Relevant indices constructed from the independent samples.
Confidence intervals
What remains to introduce?
What does this amount to?
Tests for formally assessing whether the distributions of binary outcomes differ between the two categories.
Testing the null hypotheses (all equivalent):
- Equal proportions of “1s” in the two populations (RR=1)
- Equal odds in the two groups (OR=1)
- To access whether the distribution of binary outcomes differ between two categories chi square of fishers exact test can be used.
How can we estimate the probability of a category?
By its relative frequency in the sample
An exact 95% CI for the proportion can be generated by the ci command applied to a binary variable where the category coded “1” is the one we are interested in.
What does a contingency table summarise?
- The frequency distribution of each of two categorical variables as well as the association between two categorical variables
In its simplest form, each cell of a two-way table contains the frequency counts of a variable’s category in relation to a category of another variable
- the row and column totals represent the (marginal) distributions of the variables
- the concept can be extended to multi-way contingency tables (not here)
What does the command bitest bin==p0 provide?
Provides an exact test for the null hypothesis that the probability of the “1” category is a specified value p0.
E.g. test the null hypothesis that the proportion of female and male births in the UK in 1958 was the same ( Prob(female) = Prob(male) = 0.5):
How can you calculate the odds that an exposed person develops disease?
Divide exposed number with disease (a) by exposed number with no disease (b)
a/b
What is an Odds ratio?
The ratio of the odds of developing the disease in the exposed to the odds of developing the disease in the non-exposed: (a/b)/(c/d)
How can you calculate the odds ratio that a non-exposed person develops disease?
Divide non-exposed with disease (c) by non-exposed without disease (d)
c/d
What is the risk ratio?
The ratio of the risk of developing the disease in the exposed to the odds of developing the disease in the non-exposed:
(a/(a+b))/(c/(c+d))
How can you calculate the risk that an exposed person develops disease?
Divide total of those exposed with disease (a) by total of those exposed with disease and those not exposed with disease (a+b_
a/(a+b)
How can you calculate the risk that a non-exposed person develops disease?
Divide total of those not exposed with disease (c) by those non exposed with disease and those non-exposed with no disease (c+d)
c/(c+d)
What does OR=1 mean?
Exposure does not affect the odds of outcome
E.g There is no difference in the odds of suffering malaise between males and females.
What does OR < 1 mean?
Indicates the exposure is associated with the reduced risk of developing the outcome
E.g if the odds ratio = 0.339 then the odds of a male suffering from malaise is a third (33.9%) of those of a female.