Logistic Regression Continued Flashcards
Why would we include more than one variable in a model?
- Investigate multiple associations at once
- Adjust for confounding
- Model effect modification
- Increase the power of analysis of randomised trials
- Provide better predictions by accurately modelling data
When conducting regression with binary data what can be used?
Logistic regression
What does logistic regression involve?
Modelling the log odds (logit) of the probability of outcome
Why is the logit link used instead of the log odds?
The logit link is used so we can model probability on a linear scale;probability is bounded between 0 and 1
The log odds takes any number from -infinity to +infinity. It is unbounded
What do we measure the log odds with in a logistic regression?
A linear predictor
To interpret coefficients how must we exponentiate them?
exp(Bo) = eBo= odds of Y = 1 when x1= 0
if x1 is binary
exp(B1) = eB1= odds ratio comparing x1 = 1 when x1= 0
if x1 is continuous
exp(B1) = eB1= odds ratio comparing x1 = x+1 to x1=x
i.e odds ratio for a one unit increase in x1
What else needs to be exponentiated?
Confidence intervals
Lower limits(OR) = ell
Upper limits(OR) = eul
Standard errors are given on the log scale and thus do not need to calculate CIs from standard errors on the odds ratio scale
An odds ratio of 1 means no effect - CI significant if excludes 1
p-values need no exponentiating
When doing logistic regression what is assumed?
- Outcome is binary
- Observations are independent, conditional on covariates in the model
- The linear predictor is correctly specified
* The log odds are correctly modelled by additive combinations of the variables in the model
The linear predictor can contain as many variables as we need
True or false
True
log(p(y=1)/1-P(Y=1) = B0, B1x1, B2x2, B3x3, B4x4…
Exponentiating any coefficient gives what?
An odds ratio that can be interpreted as with the single variable model
What are coefficients now conditional upon?
Other coefficients in the model
How must categorical variables be handled and what is this sometimes called?
By splitting into multiple binary variables
Sometimes called one hot endcoding
There will be one less variable than categories, the category with no variable is the reference category
* E.g. race (coded as White, Black, Hispanic, Other)
Coefficients for categorical variables are relative to what?
Reference category
How must categorical variables be coded to be used in regression and what can they have?
Numerically and they can have value labels
Putting what in front of a variable in stata tells it to treat it as categorical?
What happens if this is not included?
an i.
Stata will treat the variable as continuous
In stata what is the default category?
The one with the lowest number
- The default category can be changed by having b# in front of the i
- They can have value labels
Eg. to have the category coded as 4 as reference
- What happens when a categorical variable is added to a model?
- What needs to be done consequentially?
You add more than one variable and will get more than one p-value reported
Correctly test whether there is an association between the categorical variable and the outcome a joint test must be carried out
Two types of test:
Wald test: Simpler to implement
- Generally give same result
Likelihood ratio test: Better statistical properties
- It is best practice to report the p-value from the joint rather than the p-values for individual categories
- Report confidence intervals and estimates for individual categories
What can a likelihood ratio
test be used to compare?
Nested models
- Model A is said to be nested in model B if model B contains at least all the variables that are in model A
E.g. a model adjusting for age and smoking is nested in a model adjusting for age, smoking and gender.
What does likelihood refer to?
How likely the data is to be observed based on the parameter estimates.
What do Likelihood ratio tests compare?
The likelihood from the two models
Fit both models
Compare the likelihood
What do Wald tests use?
A quadratic approximation to the likelihood to calculate p-values based on the fitted model only
The p-values you see in the output from linear regression are what?
Wald tests
Joint tests (Likelihood ratio or Wald tests) must be used when..
Testing associations of categorical variables
Odds ratios from multiple logistic regression are conditional on what?
Other variables in the model
What is the table 2 fallacy?
Reports of multiple adjusted effect estimates from a single model. This practice, which remains common in published literature, can be problematic when different types of effect estimates are presented together in a single table.
It is not obvious how to interpret coefficients
* There are different reasons why a variable may show no effect
* Keep in mind:
* Confounders
* Mediation
* Colliders
If a third variable is a common cause of both the variable of interest and the outcome it is called what?
A confounder
What happens if we do not adjust for the confounder?
The association between the variable and outcome will be distorted
What will adjusting for the confounder reveal?
The correct association
Confounders lie on the causal pathway between the variable of interest and outcome
True or false
FALSE
Confounders do not lie on the causal pathway between the variable of interest and outcome
If a variable lies on the causal pathway between variable of interest and outcome it is called what?
A mediator
What will adjusting for a mediator reduce?
The association between the variable of interest and outcome