Module 3 - Logistic Regression Flashcards
What type of IVs and DVs can we have in logistic regression?
IVs can be categorical or continuous
DVs must be categorical
Can logistic regression imply any causation?
No
How do we calculate the odds ratio?
Odds of event occurring in one group divided by the odds of the event occurring in the other group
What is complete separation. What example does Field (2013) give?
When our predictor variables make a perfect prediction of the outcome eg: cats and burglars. It is problematic because it gives no in-between data in order to make probabilities, which is the whole point of logistic regression
What is overdispersion?
Overdispersion is when observed variance is higher than expected. It is caused by a violation of the assumption of independence and causes standard errors to become too small
What do we use the log-likelihood statistic for?
To determine how well our model fits the data
How do we calculate the deviance statistic?
-2 x log likelihood
What is parsimony? How do we achieve it in logistic regression?
Parsimony is when we prefer simpler explanations of phenomena over complicated ones. We achieve parsimony by including all variables in the model then progressively removing the ones that aren’t relevant/causing an effect.
Which assumption is broken by logistic regression? How do we overcome it?
The assumption of linearity is broken by logistic regression. We overcome it by using the log (or ‘logit’) of the data. Thus, the assumption of linearity in logistic regression assumes that there is a linear relationship between any continuous predictors and the logit of the outcome variable.
The process of cleaning data and checking for assumptions is the same in logistic regression, expect for an additional 5 steps. What are they?
- Evaluate model parsimony
- Check the linearity of the log outcome variable (logit)
- Assess the data using deviance statistics
- Check for complete separation
- Check for overdispersion
What type of regression do we do if we want to check the model for parsimony? Do we check for interactions?
Hierarchical - put variable 1 and 2 in the first block, and add 3 in the second block. Only check for interactions if we have the theory to back it up.
What does the Wald statistic test? How do we calculate it?
The Wald statistic is similar to the t-test. It tests the null hypothesis that b = 0. It is biased when b is large, because this makes this SE become inflated which makes the Wald underestimated.
Wald = b/SEb
In terms of the relationship between a predictor and outcome variable, what happens when the odds ratio is less than 1 vs. when it is greater than 1?
When odds ratio > 1: as predictor increases, odds of outcome variable increases
When odds ratio < 1: as predictor increase, odds of outcome variable decreases
Explain graphically why we can’t just use multiple linear regression.
Multiple linear regression creates a straight line of best fit when displayed graphically. However, categorical data doesn’t suit a line of best fit. Logistic regression provides a sigmoidal curve, which is much better.
What is the basic example of a research question suited to logistic regression?
“Can one level of the outcome variable (coded 0) be discriminated from the other level (coded 1) using a set of predictors?”
What other questions does logistic regression answer?
Which variables predict which outcome
How variables affect the outcome
Does a predictor variable increase or decrease the probability of an outcome, or does it have no effect in discriminating between the binary outcome?
What is logistic regression?
Logistic regression is used to predict non-continous variables
Also known as Logit Analysis
Can be multinomial, ordinal or binary.
Similar to discriminant analysis (in MANOVA) but differs in terms of assumptions so not interchangeable
Logistic regression doesn’t try to predict an outcome score. Rather, it predicts the probability an event will occur given the predictive values.
Predicts outcome by creating a variate comprised of the IVs
Variate = measure comprised of 2+ variables
What are the advantages of logistic regression?
Doesn’t require many assumptions to be met
Doesn’t require:
Normality
Linearity
Homoescadescity
Although they do increase predictive power
Can be interpreted in a similar way to multiple regression
Forced, Hierachical and stepwise methods all available
What are the disadvantages of logistic regression?
Still has some assumptions Independence of errors, Linearity of logit, Absence of outliers Need strong theoretical justification for predictors Causality cannot be established Requires large sample size Problems with model overfit/complete separation
What are log likelihood and deviance?
Log Likelihood is equivalent to the SSR (sum square residuals)
Log Likelihood: Compared predicted and actual probability
Large value = poor fit, small value = good fit
The Deviance score (-2LL) is used to compare model parsimony and to calculate R2
Chi-square distribution of log likelihood
What are the different versions of R2 in logistic regression?
R2 is a measure of variance explained; all are derived from Deviance Stat
In log regression cannot simply square R statistic
Homer and Lemenshow: Orders data by group and compares to prediction using CHISQ dist
Cox and Snell: uses sample size, used by SPSS
Neve reaches theoretical max so limited in high end
Nagelkerke: moderated Cox and Snell to fix upper limit
What is the Wald Statistic?
Wald Statistic (z statistic)
Logistic regression equivalent of the t statistic)
SPSS reports as z2 to get a chisquare distribution
Tells us whether the contribution is significant
Be cautious; when b is large SE becomes inflated
More accurate to add in hierachically and examine change in Likelihood stats
Check if CI crosses 0
How is logistic regression accomplished in SPSS?
Correlate -> Bivariate -> Add all variables
Select potential predictors
Be careful with negative predictors (cancel out positive predictors)
Analyse -> Regression -> Binary Logistic
Outcome in Dependent, Predictors in covariate
Choose ‘enter’ method (unless hierachical is warranted)
If categorical predictor present -> categorical -> move predictor into box
Save; group membership
Options; Hosmer, CI, Classification
How do you interpret logistic regression output in SPSS?
Check which cases are included under case processing
Block 0; Null hypothesis model
Probability without predictors
Variables not in equation; shows prediction outside model
Block 1; Simultaneous model
Omnibus test compares to Block 0 (p
Nagelkerke = variance explained
Homer-Lemenshow < .05 = good
Exp(B)= odds ratios
Contingency tables; Shows how many cases were correctly predicted
Classification tables; % correct predicted
compare to Null model
What are some common problems in logistic regression?
Overdispersion; the variance is larger than expected from the model
Makes SE/CIs too small
Caused by violating independence of errors assumption
Present if Dispersion Parameter is greater than 1 ( big problem if over 2)
Incomplete Information from Predictors;
Ideally, you should have some data for every possible combination of predictors (definitely for categorical)
Violation causes large SEs
Complete Separation; when outcome can be perfectly predicted by 1+ predictor
model collapses, large SEs