L13 Logistic Regression Flashcards
Differentiate between simple & multiple logistic regression.
Simple / Univariable logistic regression:
- Describes the relationship between the dependent variable (dichotomous i.e. ‘yes/no’) and a single independent variable (continuous, ordinal or nominal)
- Regression model: ln (odds of outcome) = (alpha) + (beta)x
Multiple / Multivariable logistic regression:
- An extension of simple logistic regression
- Describes the relationship between the dependent variable (dichotomous) and more than one independent variable (continuous, ordinal or nominal)
- Regression model: ln (odds of outcome) = (alpha) + (beta1)x1 + (beta2)x2 + … + (betak)xk
Given ln (odds of outcome) = (alpha) + (beta)x, what is the significance of the beta value?
e^(beta) = Odds ratio (OR)
Given ln (odds of outcome) = (alpha) + (beta1)x1 + (beta2)x2 + … + (betak)xk, what is the significance of the beta value?
x1, x2, … and xk are the values of k distinct, independent (or explanatory) variables.
For each independent variable (xi):
e^(beta) = Odds ratio (OR), after controlling for all other independent variables (i.e. keeping the values of all other independent variables constant).
Define “odds ratio”.
OR = A measure of the strength (i.e. magnitude) of association between an exposure and an outcome.
Range of values = 0 to infinity
- OR = 1: NO association between exposure & outcome (i.e. null hypothesis H0)
- OR > 1: Positive association i.e. exposure is associated with an increased risk of outcome
- OR < 1: Negative association i.e. exposure is associated with a decreased risk of outcome
E.g. of expressing odds ratio in words:
OR = 1.2:
Those who are exposed have 1.2 times the odds (i.e. 20% more likely) of developing the outcome compared with those who are unexposed.
OR = 0.8:
Those who are exposed have a 20% reduction in the odds (i.e. are 20% less likely) of developing the outcome compared with those who are unexposed.
How is odds ratio calculated?
Odds ratio
= odds that a case was exposed / odds that a control was exposed
= ad / bc (via cross-product ratio of 2R x 2C contingency table)
Rows: Exposure (Yes | No)
Columns: Outcome (Yes | No)
Odds of an event = no. of events / no. of non-events
When is odds ratio calculated?
Case-control studies (typically)
Cross-sectional studies
Relative risk is usually calculated for cohort studies instead.
State the purpose behind the hypothesis testing of simple / univariable logistic regression analysis.
To test the H0 that there is no association of the independent variable X (typically an exposure or risk factor or predictor variable) on the dependent variable Y (typically an outcome).
H0: OR = 1
H1: OR =/= 1
State the purpose behind the hypothesis testing of multiple / multivariable logistic regression analysis.
To test the H0 that there is no association of the independent variable X (typically an exposure or risk factor or predictor variable) on the dependent variable Y (typically an outcome), after controlling for all other independent variables (i.e. keeping the values of all other independent variables constant).
H0: OR = 1
H1: OR =/= 1
State the assumptions when using simple logistic regression analysis.
1) The dependent variable (typically an outcome) SHOULD be a dichotomous variable (i.e. with ONLY two categories, e.g. yes/no)
2) There is a linear relationship between the independent variable (typically exposure, risk factor or predictor variable) and the ln (odds of outcome).
3) The observations are independent of one another.
State the assumptions when using multiple logistic regression analysis.
1) The dependent variable (typically an outcome) SHOULD be a dichotomous variable (i.e. with ONLY two categories, e.g. yes/no)
2) There is a linear relationship between the independent variable (typically exposure, risk factor or predictor variable) and the ln (odds of outcome).
3) The observations are independent of one another.
4) There is little or no multicollinearity among the independent variables (x1, x2, … and xk) (typically exposure, risk factor or predictor variables) i.e. independent variables should NOT be too highly correlated with each other.
Between crude / unadjusted OR and adjusted OR, which odds ratio is used for simple logistic regression analysis?
Crude / unadjusted OR
Adjusted OR is used in multivariable logistic regression analysis!
- i.e. adjusted for other independent variables
E.g. of how to write conclusion of simple & multiple logistic regression analysis.
Simple / univariable logistic regression analysis:
Study subjects who were exposed to drug of interest had 2.08 (95% CI: 1.05 - 4.12) time the odds of developing the side effect compared with those who were not exposed.
Multiple / multivariable logistic regression analysis:
Study subjects who were exposed to drug of interest had 2.19 (95% CI: 1.05 - 4.57) time the odds of developing the side effect compared with those who were not exposed, after controlling/adjusting for gender.
How does one assess the goodness-of-fit of the logistic regression model with the observed data?
1) Hosmer-Lemeshow goodness-of-fit test
- A statistical measure on how good the logistic regression model fits the observed data
- p > 0.05: Good fit
- p < 0.05: NOT a good fit
2) Psuedo R^2 (Cox & Snell R^2 and Nagelkerke R^2)
- Pseudo R^2 values tell us approximately how much variation in the outcome is explained by the logistic regression model.
- Nagelkerke R^2 is a modification of Cox & Snell R^2, the latter of which cannot achieve a value of 1, thus the use of Nagelkerke R^2 is preferred.