Logistic regression Flashcards

Question 1

Q

Normal regression equation

Answer

A

Ŷ = bX + c
This is the linear regression model equation, make sure to know this
Ŷ is the outcome variable, “the probability of having one outcome or another based on a nonlinear function of the best linear combination of predictors” (Tabachnick and Fidel).
Ŷ-Y is the residual(s)?
where X is the predictor variable
The slope of the line is b
c is the intercept (the value of y when x = 0)

Question 2

Q

Types of research questions for logistic regression

Answer

A

• Can predict the presence or absence of a disorder/disease?
• Can we predict an outcome using a set of predictors?
o How good is the model?
• Does an individual predictor increase or decrease the probability of an outcome?
o Related to the importance of the predictors
• Can be used for classification and prediction
• Simple categorical outcomes
o Can we predict the outcomes using categorical predictors?

Question 3

Q

How does logistic regression differ from ordinary least squares regression?

Answer

A

• OLS has 3 important characteristics:
o The model is linear
o Residuals are assumed to be normally and homogenously distributed
o Predicted scores (Ŷ) are on the same scale as the data (Y)
• These characteristics don’t apply to logistic regression
The model is not a linear prediction, it is dichotomous. Better to use a ‘logistic’ function, sigmoidal shape fits the data better.
- There will be non-normality and heteroscedasticity in the residuals if OLS regression is used, which violates important assumptions of this method
The model is a probability value, and thus is on a different scale to the data

Question 4

Q

What is probability?

Answer

A

• Probability: the likelihood of an event occurring

o If p = .80, there is an 80% chance of that event occurring

Question 5

Q

What are predicted odds?

Answer

A

• Predicted odds: the probability of an event occurring divided by the probability of it not occurring
o Predicted odds = prob of event occurring/ prob of event not occurring
o Following on from p = .80, that means the probability of it not occurring is .2 (i.e. 1-the likelihood of it occuring)
o .8/.2 = 4
o This means the odds were 4:1 in favour of the event occurring
The logistic model give pi which is likelihood of an outcome occurring, so predicted odds is pi/1-pi

Question 6

Q

• Odds are asymmetric

Answer

A

so the observed odds ratio is not in the centre of the confidence interval, but we can use the natural log of the odds instead
o Log of odds = Logit

Question 7

Q

What is the odds ratio?

Answer

A

the odds of an event occurring across levels of another variable
o By how much do the odds of Y change as X increased by 1 unit
o Essentially it is a ratio of ratios
o Measure of effect size is central here; a good way of measuring the strength of the relationship.

Question 8

Q

Logistic regression equation

Answer

A

P-hat (subscript) i = 1 / 1 + e (to the power of) –(B1X1+C)

Question 9

Q

What is pi?

Answer

A

• Our model is of pî rather than Ŷ
o Pî is the estimated probability of the outcome i occurring (this is different to the predicted odds, which has another equation)

Question 10

Q

Predicted odds vs logit

Answer

A

They are just transformations of each other
• Predicted odds: odds of being a case
o Odds = p/(1-p), which ranges from 0 to positive infinity
o When p is .50, the odds are 1 (even odds, 1:1)
 .50/(1-.50) = .50/.50 = 1
o When p > .50, the odds >1
o Varies exponentially (not linearly, it’s increasingly rapid?) with the predictor(s)
• Logit: natural logarithm of the odds
o Ranges from negative infinity to positive infinity
o Reflects odds of being a case but varies linearly with predictor(s)
o Not very interpretable
 If p =.8, the odds = 4 but the logit = 1.386

Question 11

Q

2 kinds of regression coefficient in logistic regression

Answer

A

• Typical partial regression coefficients (B)
o Identical in function to OLS regression
o Indicates increment in the logit given unit increment in predictor
• Odds ratios (eB)
o Exponential B Indicates the amount by which odds of being a case are multiplied given a unit increment in predictor (or change in level of predictor if the predictor is categorical)

o If B = 0, eB = 1, the predictor has no relationship

Question 12

Q

Estimating parameters in logistic regression

Answer

A

• Logistic regression uses maximum likelihood estimation, which is an iterative solution
o Regression coefficients are estimated by trial-and-error and gradual adjustment
 Seeks to maximise the likelihood (L) of the observed values of Y given a model and using the observed values of the predictors

Question 13

Q

What is the log-likelihood

Answer

A

Log Likelihoods
• To evaluate the model, a log likelihood (LL) value can be calculated for each model we test
• The LL is a function of the probabilities of the observed and model-predicted outcomes for each case, summed over all cases
• We can directly compare the goodness-of-fit of different models using the log likelihoods

Question 14

Q

How is model fit tested in logistic regression?

Answer

A

log-likelihood ratio test - a test of model fit
Significant likelihood ratio test tells us that the model is significantly worse with the corresponding predictor removed, thus the predictor should be retained in the model. If non-sig, you can probably remove that predictor.

Question 15

Q

How does the log-likelihood ratio test work?

Won’t be asked directly about this but need to know it for questions where you have to report results- will be good if you can interpret model fit statistics

Answer

A

• In likelihood ratio test, we test the null deviance (including only the constant) against the model deviance (containing k predictors)
• As k increases, the difference between the null and model deviance will generally increase, which improves the model fit
• If there is no significant improvement in fit when we add the k predictors to the model, we need to question the inclusion of those predictors
• If there is no significant deterioration in fit when we remove k predictors from the model, then we need to question the inclusion of those predictors
o I.e. they are redundant in the context of this outcome variable
• Only accept more predictors if they increase the significance of the model

Question 16

Q

What are Pseudo R2’s? and limitations

Answer

Study These Flashcards

A

These are analogous to R2 in linear multiple regression and attempt to describe the model in terms of ‘variance accounted for’ however we can not literally interpret them as such due to the heteroscedasticity which can not be avoided with a dichotomous DV.

Question 17

Q

How is significance/contribution of individual predictors tested in logistic regression?

Answer

Study These Flashcards

A

1) Contribution to logit
Significance of the regression coefficient
The Wald statistic- predictor contribution to logit
2) Contribution to prediction:
Likelihood ratio test- compares the likelihood ratio with and without the predictor
Alternatively, the backwards stepwise method- see how good model is when you remove predictor

Question 18

Q

How are categorical and continuous predictors referred to in multinomial logistic regression?

Answer

Study These Flashcards

A

o Categorical predictors are called ‘factors’

o Continuous predictors are called ‘covariates’

Question 19

Q

Difference between Binary vs multinomial regression?

Answer

Study These Flashcards

A

o Binary logistic: binary DV only

o Multinomial: more than 2 categories of DV can be analysed

Question 20

Q

Assumptions of logistic regression

Answer

Study These Flashcards

A

o	High ratio of cases to variables
o	Adequacy of expected frequencies?
o	Linearity in the logit
o	Absence of multicollinearity
o	Absence of outliers in the solution
o	Independence of observations and errors

Question 21

Q

Assumptions of logistic regression

Answer

Study These Flashcards

A

o	High ratio of cases to variables
o	Adequacy of expected frequencies?
o	Linearity in the logit
o	Absence of multicollinearity
o	Absence of outliers in the solution
o	Independence of observations and errors

Question 22

Q

Testing the assumption of linearity in the logit

Answer

Study These Flashcards

A

o Hosmer and Lemeshow method
–>Turn covariate into quartiles and enter as a factor. Do the Bs (the logits) show a roughly linear trend?
o Box-Tidwell Method
–>Construct a new predictor[example Leadership*log(leadership)] If this extra predictor is sig. then there is evidence of non-linearity in the logit

Question 23

Q

4 things to report when writing up logistic regression

Answer

Study These Flashcards

A

1) Overall model fitting information
2) Predictor contribution
- ->likelihood ratio test
- -> contribution to logit (wald statistic) in parameter estimates table
3) Odds ratios

Question 24

Q

How to test for moderation in logistic regression

Answer

Study These Flashcards

A

If you have continuous variables standardise them and comppute a product term for predictor 1 x predictor 2, and entering this at step 2 in regression. The significance of this interaction term would indicate that there is a moderation.

Logistic regression Flashcards

(24 cards)