LOGISTIC REGRESSION Flashcards
1
Q
What is logistic regression?
A
- Logistic regression is used to predict non-continous variables
- Also known as Logit Analysis
- Can be multinomial, ordinal or binary.
- Similar to discriminant analysis (in MANOVA) but differs in terms of assumptions so not interchangeable
- Logistic regression doesn’t try to predict an outcome score. Rather, it predicts the probability an event will occur given the predictive values.
- Predicts outcome by creating a variate comprised of the IVs
- Variate = measure comprised of 2+ variables
2
Q
What are the advantages of logistic regression?
A
-
Doesn’t require many assumptions to be met
- Doesn’t require:
- Normality
- Linearity
- Homoescadescity
- Although they do increase predictive power
- Doesn’t require:
- Can be interpreted in a similar way to multiple regression
- Forced, Hierachical and stepwise methods all available
3
Q
What are the disadvantages of logistic regression?
A
- Still has some assumptions
- Independence of errors,
- Linearity of logit,
- Absence of outliers
- Need strong theoretical justification for predictors
- Causality cannot be established
- Requires large sample size
- Problems with model overfit/complete separation
4
Q
What is the odds ratio?
A
-
The odds ratio (ExpB) tells you how a 1 unit change in the predictor will affect the probability of the outcome occuring
- Ratio = ( odds after unit change) / (original odds)
- >1 = odds of outcome increase < 1 = odds of outcome decrease
- If confidence interval crosses 1 the ratio isn’t statistically significant
-
Unadjusted vs Adjusted Odds Ratio:
- UOR: not adjusted for presence of other predictors
- AOR: Represents association when other variables are held constant
5
Q
What are Model Parsimony and Linearity of the Logit?
A
-
Model Parsimony; A parsimonious model is one in which minimal predictor variables that together maximally explain the outcome variable
- Select and use only those predictors that are likely to explain the outcome
-
Linearity of the Logit; a linear relationship between the continuous predictors and the log transformation of the outcome variable
- log transformation means that the probabilities remain between 0 and 100%.
6
Q
What are log likelihood and deviance?
A
- Log Likelihood is equivalent to the SSR (sum square residuals)
- Log Likelihood: Compared predicted and actual probability
- Large value = poor fit, small value = good fit
- Log Likelihood: Compared predicted and actual probability
- The Deviance score (-2LL) is used to compare model parsimony and to calculate R2
- Chi-square distribution of log likelihood
7
Q
What are the different versions of R2 in logistic regression?
A
- R2 is a measure of variance explained; all are derived from Deviance Stat
- In log regression cannot simply square R statistic
- Homer and Lemenshow: Orders data by group and compares to prediction using CHISQ dist
-
Cox and Snell: uses sample size, used by SPSS
- Neve reaches theoretical max so limited in high end
- Nagelkerke: moderated Cox and Snell to fix upper limit
8
Q
What is the Wald Statistic?
A
- Wald Statistic (z statistic)
- Logistic regression equivalent of the t statistic)
- SPSS reports as z2 to get a chisquare distribution
- Tells us whether the contribution is significant
- Be cautious; when b is large SE becomes inflated
- More accurate to add in hierachically and examine change in Likelihood stats
- Check if CI crosses 0
9
Q
How is logistic regression accomplished in SPSS?
A
-
Correlate -> Bivariate -> Add all variables
- Select potential predictors
- Be careful with negative predictors (cancel out positive predictors)
-
Analyse -> Regression -> Binary Logistic
- Outcome in Dependent, Predictors in covariate
- Choose ‘enter’ method (unless hierachical is warranted)
- If categorical predictor present -> categorical -> move predictor into box
- Save; group membership
- Options; Hosmer, CI, Classification
10
Q
How do you interpret logistic regression output in SPSS?
A
- Check which cases are included under case processing
- Block 0; Null hypothesis model
- Probability without predictors
- Variables not in equation; shows prediction outside model
- Block 1; Simultaneous model
- Omnibus test compares to Block 0 (p<.05 =good predictor)
- Nagelkerke = variance explained
- Homer-Lemenshow < .05 = good
- Exp(B)= odds ratios
- Contingency tables; Shows how many cases were correctly predicted
- Classification tables; % correct predicted
- compare to Null model
11
Q
How is model parsimony tested in SPSS?
A
- Model Parsimony; testing during main analysis
- Add the different predictors in steps
- Under Categorical; tick change and contrast
- Under omnibus tests; compare the blocks and rerun only the best model
12
Q
What are some common problems in logistic regression?
A
-
Overdispersion; the variance is larger than expected from the model
- Makes SE/CIs too small
- Caused by violating independence of errors assumption
- Present if Dispersion Parameter is greater than 1 ( big problem if over 2)
-
Incomplete Information from Predictors;
- Ideally, you should have some data for every possible combination of predictors (definitely for categorical)
- Violation causes large SEs
-
Complete Separation; when outcome can be perfectly predicted by 1+ predictor
- model collapses, large SEs
*
- model collapses, large SEs