Binary Logistic Regression Flashcards
What is a contingency table?
Summarises frequency distribution of each of two categorical variables and the association between them
What can a contingency table allow us to check?
▪️How each potential explanatory variable is related to the DV
▪️The categories for explanatory variables are large enough (at least 5)
▪️How many missing cases for each
What is the risk of an outcome?
The number of times the outcome of interest occurs divided by the total number of possible outcomes
How do you calculate the risk ratio (relative risk)?
Risk of one outcome divided by risk of the other outcome
What does a risk ratio of 1 indicate?
No difference between the groups
How do you calculate the odds of an outcome?
The number of times it occurs divided by the number of times it doesn’t occur
How do you calculate the odds ratio?
Odds of outcome in one group compared to odds of outcome in the other group
What does an odds ratio of 1 mean?
▪️Outcome occurs half of the time
▪️Exposure doesn’t affect the odds (no difference between groups)
What does an odds ratio <1 indicate?
Outcome occurs less than half the time
Indicates exposure is associated with reduced risk of developing outcome
What does an odds ratio >1 indicate?
Outcome occurs more than half the time
Indicates exposure is associated with increased risk of developing outcome
Which ratio can be used for case-control studies?
Odds ratio
When can you NOT use risk ratios and why?
In case control studies where selection of subjects is based on the outcome
What happens to RR and OR in rare outcomes?
They will be approximately equal
What can you assume about a relationship where the outcome is binary?
It is nonlinear, S-shaped (sigmoid) relationship
Linear cannot make sense!
▪️Outcome is limited to 0,1
▪️Goal is to separate groups, not minimise mean squared error
▪️Linear regression is highly sensitive to influential cases
▪️Assumptions of linear are violated, particularly homogeneity of variance
What is the link function?
A way of relating x and Y by way of a function in a logistic regression
To model the relationship between left and right hand side of the equation
g(y) = β0 + β1x1 + ε
What is maximum likelihood?
An iterative process that estimates the best fitted equation
Find the coefficient that makes the observed data most likely
(try lots of models until tweaking it further doesn’t improve the fit)
What does the coefficient in a logistic regression mean?
An increase of X of 1 unit will change the ODDS (log) by β
How do you calculate the odds ratio for X?
The anti-log of the coefficient
(e to the power of the coefficient)
When do you use the binary logistic regression?
To test whether there’s an association between variables, where the dependent variable is categorical on a binary scale
What assumptions are needed for the binary logistic regression?
▪️Binary dependent variable has a Bernoulli distribution
▪️It’s linearly related to the predictor only after transforming into the logit scale
▪️observations are independent
▪️continuous variables have a linear effect on the log-odds scale
If you have a continuous outcome/DV, what type of regression should you use?
Linear regression
If you have a binary (categorical) outcome/DV, what type of regression should you use?
Binary logistic regression
What ratio can you calculate in a case-control study?
Odds ratio
(NOT risk!)
What ratio can you calculate in a prospective cohort study?
Odds OR risk ratio
What is incidence?
Aka risk
e.g., number of people who got disease when exposed over the total number of people who were exposed