Lecture 7 Flashcards
Why can’t you use regular regression for binary outcomes?
- because you can get values other than 0 or 1
- can have below 0 and above 1 and decimals
- this does not make sense when trying to interpret; cannot exptrapolate
What does logisitic regression involve?
- model the probability of predicting Y=1 (this is a continuous function ranging from 0-1)
- model: log odds of obtaining Y=1
- predict this as a regression
How do you calculate the odds and probability in logisitc regression?
- use the values in the formula to get log(odds)
- odds = e^(log(odds))
- P(Y=1) = odds/(1+odds)
How do you interpret odds and log(odds)?
- odds > 1: Y=1 more probable than Y=0
- log(odds) > 0: Y=1 more probable than Y=0
- odds=1 or log(odds)=0: equal chances of each
Why do we use log in logisitic regression?
- can put in any values from -infinity to infinity, yet:
- the function cannot go below 0 or above 1
How do you sub the regression equation into the log function?
1 / (1 + e^-(regression equation))
What is the link? What are the different types of links?
- link = function (f(Y)), sometimes mu
- identity link: mu = Y (linear model)
- logistic link: mu = log P(Y=1)/P(Y=0). For binary variables
- logarithmic link: mu = logY. For counts/frequencies, loglinear model
Why do we use links/functions?
- GLM allows linear techniques to be used on non-linear data
- when datasets do not conform to the assumptions of linear regression
What are the assumptions of logistic regression? What is not assumed?
- binary outcomes that are MUTUALLY EXCLUSIVE
- independence of observations (as usual)
- IVs can be continuous or categorical
- NOT normality, linearity, homoscedasticity
How do you interpret the SPSS output for logistic regression?
- Block 0: doesn’t tell you much, classification table tells you proportion of Y=0
- Block 1: look at R2 (Nagelkerke)
- % correct > how much correct classification the model has
- Exp(B) = the odds ratio, interpret as: odds increase by a FACTOR of this when the IV increases by one unit
- also look at the CI for Exp(B)
What is the difference between Cox and Snell’s and Nagelkerke’s R2 values?
- C+S: function of the likelihood ratio, does not have a maximum of 1
- N: adjusts C+S by taking it to the maximum possible value
Why do you have to use loglinear regression rather than X2?
- when there is a 3x3 not a 2x2 table
- X2 works with 2x2 only
What is Simpson’s paradox?
conclusions drawn from the margins of a table are not necessarily the same as those from the whole table
What are loglinear models based on?
counts or frequencies
3+ categorical variables
What is the formula for loglinear model? What do you actually test?
logF(MD) = sigma + lambda(M) + lambda(D) + lambda(MD)
- tests INTERACTION to see if the variables are associated
- test to see if the NON-SATURATED model is an accepatble fit