Binary Dependent Variable Flashcards
What is binary dependent variable models, and what choices do we have?
In models with binary dependent variables, the regression function is interpreted as a conditional probability function of the binary dependent variable. Three choices: (1) LPM (2) Probit and (3) Logit. Probit and Logit models allows for non-linear relationship between regressors and dependent variable.
Assumptions for probit and logit:
- Linear in parameters
- Random sampling
- No perfect multicollinearity
- Zero conditional mean of errors
- Homoskedasticity
Assumptions on parameters:
- Consistency: when sample increase, the estimated B will converge to the real B.
- Unbiasedness: The expected value of the estimated B will be equal to the true B.
R2 meaning
R2 has no meaningful interpretation due to the regression line never being able to fit the data
perfectly if the dependent variable is binary and regressors are continuous, as R2 relies on a
linear relationship between X and Y -> Use correctly predicted proportion (“hitrate”) or
PseudoR2 (McFadden), which compares the maximum likelihood with X as opposed to
without X. Maximum likelihood estimators are normally distributed in large samples, can do
inference! ML estimates the unknown parameters by choosing them such that the likelihood
of drawing the sample we observe is maximized (hence, estimates the optimal alpha and
betas)
What will the stndard errors in LPM model always be?
Heteroscedastic.
Using robust standard errors is imperative as the residuals in a linear probability model
always are heteroskedastic
Interpretation of Probit coefficient
For standard independent variables: one unit change in X is associated with B1 change in z
For log-transformed variables: one unit change log X is associated with B1 change in z
z is subsequently interpreted as an associated probability drawn from the cumulative
normal standard distribution (CDF)
Due to the less straightforward and economic/logic interpretation of probit coefficients it is a
more common approach to report the marginal effects. One feature of the probit is that each x
will have a different effect on z! (Marginal effects will differ for different X)
Benefit of S-shape
The benefit of the S-shape is that it predicts conditional probabilities in the interval 0 to 1!
T-statistics and confidence intervals can be used due to large sample approximation (CLT)
Difference between probit and logit
The only virtual difference between probit and logit is the distribution function, which also
has implications for the interpretation. Logit uses the standard logistic distribution function.
Log-odds interpretation (prob of success / prob of failure) – difficult to comprehend!
The main reason is historical: logit is computationally faster and easier, but that does not matter nowadays
In practise, logit and probit are very similar
Empirical results typically should not hinge on the logit versus probit choice
Linear probability model
Natural starting point is the linear regression model with a single regressor. In the LPM, the predicted value of y is interpreted as the predicted probability that y = 1, and β is the change in that predicted probability for a unit increase in x
y = α + βx + u
The LPM models Prob(y = 1|x) as a linear function of x
Why is the linear regression model sometimes called the linear probability model?
When y is binary, the linear regression model y = α + βx + u is called linear probabiltiy model because Prob(y = 1|x) = α + βx
Advantages and disadvantages with LPM?
Advantages:
Simple to estimate and interpret
Inference the same as for multiple regression (note: LPM is inherently heteroskedastic)
Disadvantages:
A LPM says that the change in the predicted probability for a given change in x is the same for all values of x, but that does not always make sense
Also, LPM predicts probabilities that can be < 0 and > 1
Overall:
we need a non-linear model: probit and logit regression
Why Cumulative normal regression?
The “S-shape” gives us:
Marginal probabilities depend on the value of x
Predicted probabilities are bounded between 0 and 1
Easy to use
Relative straight forward interpretation:
z = β0 + β1x1 + · · · + βk xk
β1 is the change in the z-value of a unit change in x1 holding everything else constant
How do we interpret the marginal effects in this chapter?
Marginal effect: the effect on the dependent variable that results from changing an independent variable by a small amount
Probit and logit are non-linear functions
Hence, the ultimate effect of one unit change in a regressor (x) on predicted probabilities is different from different values of x
It is common to report marginal (or partial) effects instead of coefficients
Marginal effect at the means I Average marginal effect
Maximum Likelihood
The likelihood function is the conditional density of y1, . . . , yn given x1, . . . , xn treated as a function of the unknown parameters α, β Maximizing the probability of observing the data given the assumed model For probit (with one explanatory variable):
The maximum likelihood estimator (MLE) is the value of α, β1, . . . , βk that maximinize the likelihood function
The MLE is the value of α, β1, . . . , βk that best describe the full distribution of the data
In large samples, MLE is: consistent I normally distributed, efficient (has the smallest variance of all estimators)
Measures of Fit
The R 2 and Adjusted R2 do not make sense here (even for LPM). So, two other specialised measures are used
The fraction correctly predicted = fraction of y’s for which the predicted probability is > 50% when yi is 1, or is < 50% when yi is 0
Hypothesis testing
Usual t-tests and confidence intervals can be used by testing h0: b=0.
For joint hypothesis test, use likelihood ratio test that compares likelihood functions of the restricted and unrestricted models. For joint hypothesis, there is for example H0: B0 = 0, B1 = 0, several ones you test. Because there is several, you should use F-statistics because it is designed for testing several at a time. You run one regression under the null hypothesis. You then compare the fits of this regression – its R2 – to the fitted one. If the unrestricted fits better, reject h0.