Logistic Regression I Flashcards by Amelia Jasinski

What is the purpose of logistic regression?

It models the relationship between a binary outcome and one or more independent variables

How well did you know this?

Not at all

Perfectly

How is logistic regression different from linear regression?

Unlike linear regression, logistic regression is used for binary outcomes and applies a logit transformation to ensure predicted probabilities stay between 0 and 1

How well did you know this?

Not at all

Perfectly

What types of variables can be used as predictors in logistic regression?

Continuous and categorical

How well did you know this?

Not at all

Perfectly

What is the formula for the logistic regression model?

To express the response probability as a linear function:
ln ( π / 1 - π ) = β0 + β1X
Where:
- ln ( π / 1 - π ) = log odds or logit
- π / 1 - π = odds
- B0 = intercept/constant
- B1 = coefficient for predictor X/slope

How well did you know this?

Not at all

Perfectly

How do you convert logistic regression coefficients to an OR?

Take the exponent of the coefficient:
Odds ratio = e^B1

How well did you know this?

Not at all

Perfectly

What is an OR?

The ratio of the odds of an event occurring in one group compared to another
Odds ratio = odds (group 1) / odds (group 2)

How well did you know this?

Not at all

Perfectly

How do you interpret OR?

OR = 1: No effect (same odds in both groups)
OR > 1: Higher odds of the event occurring
OR < 1: Lower odds of the event occurring

How well did you know this?

Not at all

Perfectly

What are probabilities and odds?

Probability (risk): π = occurrences/opportunities (range = 0 - 1)
Odds: π / 1 - π (probability of event occurring / probability of an event not occurring; range 0 - plus infinity)

How well did you know this?

Not at all

Perfectly

What does an OR of 1.5 mean?

The event is 50% more likely to occur in the exposed group than in the reference group

How well did you know this?

Not at all

Perfectly

What does an OR of 0.75 mean?

The event is 25% less likely to occur in the exposed group compared to the reference group

How well did you know this?

Not at all

Perfectly

What is the likelihood ratio test (LRT) used for?

To compare two nested models and determine if adding a predictor improves the model

How well did you know this?

Not at all

Perfectly

What are the hypothesis for the LRT?

H0: Adding the variable makes no difference
H1: Adding the variable improves the model

How well did you know this?

Not at all

Perfectly

How do you calculate the LRT statistic?

LRT = 2 [ ln (L1) − ln (L0) ]
Where L1 is the likelihood of the full model and L0 is the likelihood of the nested model

How well did you know this?

Not at all

Perfectly

What are the key assumptions of logistic regression?

Observations are independent
The outcome variable is binary
No multicollinearity (high correlation between predictors)
The relationship between continuous predictors and log-odds is linear after adjusting for any other covariates
No significant interactions unless explicitly modelled. The effect of an exposure is the same regardless of the value of any other independent variable, and vice versa.
No unobserved confounding

How well did you know this?

Not at all

Perfectly

Why do we use multivariable logistic regression?

To adjust for confounders like age, sex, and BMI

How well did you know this?

Not at all

Perfectly

How do you fit a logistic regression model in Stata?

logistic <outcome> <predictor1> <predictor2> <predictork>
Note: Adding 'i' before a categorical variable tells Stata to interpret it as a categorical variable</predictork></predictor2></predictor1></outcome>

How well did you know this?

Not at all

Perfectly

How do you compare two models using LRT in Stata?

Store each model to memory:

<est>
<est>
Then compute an LRT test:
<lrtest>
</lrtest></est></est>

How well did you know this?

Not at all

Perfectly

What is the logit link function?

A BLR model uses a logit (or logistic) transformation of probability π
π = exp(β0 + β1x) / 1 + exp(β0 + β1x)

How well did you know this?

Not at all

Perfectly

What are the properties of logit transformation of π?

Study These Flashcards

Produces values of π between 0 and 1
Symmetric about π = 0.05
The curve is almost a straight line for 0.2 < π < 0.8

What does it mean if π = 0.5?

Study These Flashcards

The event is equally likely to occur or not occur, using the equation the odds would be:
odds = 0.5 / 1 - 0.5 = 0.5 / 0.5 = 1

What does it mean if π = 0.4

Study These Flashcards

The event is more likely to occur than not
Odds: = 0.4 / 1 - 0.4 = 0.4 / 0.6 = 0.67

What is the log odds a result of?

Study These Flashcards

The linear relationship between Y given X is specified after making a logit transformation. This results in the log odds, which can be positive or negative

What does it mean if π > 0.5? E.g., if π = 0.6

Study These Flashcards

The event is more likely to occur than not
Odds: 0.6 / 1 - 0.6 = 0.6 / 0.4 = 1.5

How do you fit a logit model in Stata?

Study These Flashcards

logit <outcome> <predictors></predictors></outcome>

What's the difference between the and command?

The command gives ORs Exponentiation of the coefficients from gives the ORs in

How would you interpret an OR of 0.89?

The odds of an event for [Group 1] are 0.89 times the odds of [Group 2]; or the odds of [Group 1] are lower than the odds of [Group 2] by 11% = 100% x (1 - 0.89)

What does significance testing involve in logistic regression?

H0: β0 = 0 H1: β ≠ 0 Tests whether the null hypothesis that the true parameter value is 0 and the associated OR is 1 Z-ratio statistic is calculated and compared with a normal distribution (only used with the logit function) Note: With one independent variable, the Wald test is equivalent to the z-ratio statistic

How is the z-ratio statistic calculated?

Using the output of the logit function in Stata z = β^ / SE (β^)

How is the Wald statistic calculated?

W = β^2 / var (β^2)

What does a 95% CI of 0.75 and 0.97 tell us about where the OR lies?

The OR lies between 0.75 and 0.97 If we repeated our study 100 times, the OR would likely lie between two values 95 out of 100 times Overall, we are 95% confident that the true OR likely lies between 0.75 and 0.97

What does it mean if 95% CIs do not include 1?

If the 95% CIs do not include the null hypothesis of no effect (OR = 1), there is evidence that the difference between the exposed and the reference group is significant

Which command should be used to obtain ORs and to make explicit the category of the slope parameter?

logistic

What is the utility of using 'i.' before a categorical variable in regression?

Stata creates a set of binary variables prompted by 'i.' They all use the same reference category which by default is the first category

If we have age with multiple categories as a predictor, how can we estimate the overall effect of age on the outcome?

Use command (Wald test) to test joint hypotheses Taking the age variables as a whole tests whether the coefficients of each age category are simultaneously zero

What is the issue with Wald test with small sample sizes?

The Wald test evaluates whether it is likely that an estimated coefficient could be zero (no effect). However, the Wald test is not very reliable with small sample sizes

What is the likelihood ratio test (LRT)?

Test of the goodness-of-fit between two nested models

What is a nested model?

A model that includes the same covariates as another but specifies at least one additional parameter to be estimated

When are models not deemed to be nested?

If the number of observations is different compared to the other model

LRT hypotheses:

H0: The two nested models are equivalent - adding a new variable makes no different to the model H1: The two nested models are different, and a variable contributes to the model

LRT statistic

LRT = 2[log(L1) - log(L0)] L1: likelihood of the model that includes the variable you want to test (model 1) L0: Likelihood of the model that does not include the variable (the "nested" model; model 0)

What does a p-value < 0.05 indicate in an LRT?

There is evidence for an effect of the predictor on the outcome after accounting for other vaeuables

In the context of an LRT, if the p-value < 0.05, what should you do?

Always opt for the simpler model

What kind of test is the LRT?

chi-square test

For continuous exposures, what does the exponentiated estimated coefficient (in a logit model) represent?

The difference in the odds of the outcome, which increases multiplicatively by the value of the OR for a one-unit change in the exposure

What's the procedure to comparing two models using an LRT?

1. Obtain L1 by fitting the larger model: quietly: logistic 2. Save this model: est store a 3. Obtain L0 by fitting smaller model: quietly: logistic 4. Save this model: est store b 5. Compare L1 and L0: lrtest a b

What happens if adding variables changes the sample size between two models in the context of an LRT?

If L1 has a different sample size to L0, the same amount of people missing from L1 needs to be subtracted from L0. The removal may not be random depending on the mechanism by which the missing data occurs

Logistic Regression I Flashcards

(46 cards)