Chapter 4: Logistic and Poisson Regressions Flashcards
Logistic Regression
- Models of discrete choice have been a topic in (Micro-) Econometrics and are nowadays widely used in Marketing research.
- Logit, and probit models extend the principles of general linear models (ex., regression) to better treat the case of dichotomous and categorical target variables.
- They focus on categorical dependent variables, looking at all levels of possible interaction effects.
- McFadden got the 2000 Nobel price in Economics for fundamental contributions in discrete choice modeling.
Application of Logistic Regression
- Why do commuters choose to fly or not to fly to a destination when there are alternatives.
- Available modes = Air, Train, Bus, Car
- Observed:
- Choice
- Attributes: Cost, terminal time, other ■Characteristics of commuters: Household income
- Choose to fly iff Ufly > 0
- Ufly = β0+β1Cost + β2Time + γIncome + ε
The Linear Probability Model
- The predicted probabilities of the linear model can be
greater than 1 or less than 0
- ε is not normally distributed because ! takes on only two values
- The error terms are heteroscedastic
Gauss-Markov-Assumptions
- The OLS estimator is the best linear unbiased estimator (BLUE), iff
- there is a linear relationship between predictors x and y
- the error variable is a normally distributed random variable with E(ε)=0.
- the error variance is constant for all values of * (homoscedasticity).
- The errors ε are independent of each other.
- No multicollinearity among predictors (i.e., high correlation).
The Logistic Regression Model
- The “logit” model solves the problems of the linear model:
- ln[p/(1-p)] = β0 + β1X1 + ε
- p is the propability that the event Y occurs, Pr(Y= 1 | X1)
- p/(1 - p) describes the odds
- The 20% propability of winning describes odds of 0.20/0.80=0.25
- A 50% chance of winning leads to odds of 1
- ln[p/(1-p)] is the log odds, or “logit”
- p = 0.50, then logit = 0
- p = 0.70, then logit = 0,84
- p = 0.30, then logit = -0,84
Logistic Function
- The logistic function Pr1!|(3 constrains the estimated probabilities to lie between 0 and 1 (0 <= Pr(Y | X) <= 1).
- Pr(Y | X) = eβ0+β1X1 / (1 + eβ0 +β1X1)
- Pr(Y | X) is the estimated probability that the ith case is in a category and β0 + β1X1 is the regular linear regression equation
- This means that the probability of a success (Y = 1) given the predictor variable (X) is a non-linear function, specifically a logistic function
- if you let β0 +β1X1 =0,then p = .50
- as β0 + β1X1 gets really big, p approaches 1
- as β0 + β1X1 gets really small, p approaches 0
- The values in the regression equation β1 and β0 take on slightly different meanings.
- β0 <- The regression constant (moves curve left and right)
- β1 <- The regression slope (steepness of curve)
- -β0/β1 <- The threshold, where probability of success = .50
Odds and Logit
By algebraic manipulation, the logistic regression equation can be written in terms of an odds of success:
- p/(p-1) = eβ0+β1X1
- Odds range from 0 to positive infinity
- If p/(p-1) is
- less than 1, then less than .50 probability
- greater than 1, then greater than .50 probability
The Logit
Finally, taking the natural log of both sides, we can write the equation in terms of logits (log-odds):
- Probability is constrained between 0 and 1
- Log-odds are a linear function of the predictors
- Logit is now between-∞ and+∞(asthe dependent variable of a linear regression)
- The regression coefficients go back to their old interpretation (kind of)
- The amount the logit (log-odds) changes, with a one unit change in (1
Estimating the Coefficients of a Logistic Regression
- Maximum Likelihood Estimation (MLE) is a statistical method for estimating the coefficients of a model
- The likelihood function (l) measures the probability of observing the particular set of dependent variable values that occur in the sample
- MLE involves finding the coefficients that makes the log of the likelihood function (ll < 0) as large as possible
The Likelihood Function for Logit Model
- Suppose 10 individuals make travel choices between auto (A) and public transit (T).
- All travelers are assumed to possess identical attributes (unrealistic), and so the probabilities are not functions of β’s but simply a function of p, the probability p of choosing auto.
- ■l = px (1 - p)n-x = p7 (1-p)3
■ ln(l) = 7ln(p) + 3ln(1-p), maximized at 0.7
- ■l = px (1 - p)n-x = p7 (1-p)3
Evaluating the Logistic Regression
- The log likelihood function (ll) is one metric to compare two logistic regression models (the higher, the better)
- Also AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) measure the goodness-of-fit
- There are several measures intended to mimic the R2 analysis (Pseudo-R2, e.g., McFadden-R2 or Nagelkerke-R2), but the interpretation is different
- A Wald test or t-test is used to test the statistical significance of each coefficient in the model hypothesis that βi=0
- The Chi-Square statistic and associated /-value shows whether
- the model coefficients as a group equal zero ( Group :
- Larger Chi-squares and smaller p-values indicate greater confidence in rejecting the null hypothesis of no
- Use also error rates and gain curves to evaluate the performance
McFadden R2 / Pseudo R2
R2McFadden = 1 - (ll)/(ll0) <– describes fit of model
- If the full model does much better than just a constant, in a discrete-choice model this value will be close to 1.
- 1-(80.9658/123.757)=0.3458 for the logit model on the previous to last slide
- If the full model doesn’t explain much at all, the value will be close to 0.
- Typically, the values are lower than those of R2 in a linear regression and need to be interpreted with care.
- >0.2 is acceptable, >0.4 is already ok
Calculating Error Rates from a Logistic Regression
- Assume that if the estimated / is greater than or equal to .5 then the event is expected to occur and not occur otherwise.
- By assigning these probabilities 0s and 1s and comparing these to the actual 0s and 1s, the % correct Yes, % correct No, and overall % correct scores are calculated.
Simple Interpretation of the Coefficients
- If β1 <0 then an increasein X1 =>(0 < exp(β1) < 1)
- then odds go down
- If β1 > 0 then an increase in X1 => (exp(β1) > 1)
- then odds go up
- Always check for the significance of the coefficients
- But can we say more than this when interpreting the coefficient values?
Multicollinearity and Irrelevant Variables
- The presence of multicollinearity will not lead to biased coefficients, but it will have an effect on the standard errors.
- If a variable which you think should be statistically significant is not, consult the correlation coefficients.
- If two variables are correlated at a rate greater than .6, .7, .8, etc. then try dropping the least theoretically important of the two.
- The inclusion of irrelevant variables can result in poor model fit.
- You can consult your Wald statistics and remove irrelevant variables.