Binary Dependent Variable Flashcards by Hans SSS

What is binary dependent variable models, and what choices do we have?

In models with binary dependent variables, the regression function is interpreted as a
conditional probability function of the binary dependent variable. Three choices: (1) LPM (2)
Probit and (3) Logit. Probit and Logit models allows for non-linear relationship between
regressors and dependent variable.

How well did you know this?

Not at all

Perfectly

Assumptions for probit and logit:

Linear in parameters
Random sampling
No perfect multicollinearity
Zero conditional mean of errors
Homoskedasticity

How well did you know this?

Not at all

Perfectly

Assumptions on parameters:

Consistency: when sample increase, the estimated B will converge to the real B.
Unbiasedness: The expected value of the estimated B will be equal to the true B.

How well did you know this?

Not at all

Perfectly

R2 meaning

R2 has no meaningful interpretation due to the regression line never being able to fit the data
perfectly if the dependent variable is binary and regressors are continuous, as R2 relies on a
linear relationship between X and Y -> Use correctly predicted proportion (“hitrate”) or
PseudoR2 (McFadden), which compares the maximum likelihood with X as opposed to
without X. Maximum likelihood estimators are normally distributed in large samples, can do
inference! ML estimates the unknown parameters by choosing them such that the likelihood
of drawing the sample we observe is maximized (hence, estimates the optimal alpha and
betas)

How well did you know this?

Not at all

Perfectly

What will the stndard errors in LPM model always be?

Heteroscedastic.

Using robust standard errors is imperative as the residuals in a linear probability model
always are heteroskedastic

How well did you know this?

Not at all

Perfectly

Interpretation of Probit coefficient

For standard independent variables: one unit change in X is associated with B1 change in z
For log-transformed variables: one unit change log X is associated with B1 change in z
z is subsequently interpreted as an associated probability drawn from the cumulative
normal standard distribution (CDF)
Due to the less straightforward and economic/logic interpretation of probit coefficients it is a
more common approach to report the marginal effects. One feature of the probit is that each x
will have a different effect on z! (Marginal effects will differ for different X)

How well did you know this?

Not at all

Perfectly

Benefit of S-shape

The benefit of the S-shape is that it predicts conditional probabilities in the interval 0 to 1!
T-statistics and confidence intervals can be used due to large sample approximation (CLT)

How well did you know this?

Not at all

Perfectly

Difference between probit and logit

The only virtual difference between probit and logit is the distribution function, which also
has implications for the interpretation. Logit uses the standard logistic distribution function.
Log-odds interpretation (prob of success / prob of failure) – difficult to comprehend!

The main reason is historical: logit is computationally faster and easier, but that does not matter nowadays
In practise, logit and probit are very similar

Empirical results typically should not hinge on the logit versus probit choice

How well did you know this?

Not at all

Perfectly

Linear probability model

Natural starting point is the linear regression model with a single regressor. In the LPM, the predicted value of y is interpreted as the predicted probability that y = 1, and β is the change in that predicted probability for a unit increase in x

y = α + βx + u
The LPM models Prob(y = 1|x) as a linear function of x

How well did you know this?

Not at all

Perfectly

Why is the linear regression model sometimes called the linear probability model?

When y is binary, the linear regression model y = α + βx + u is called linear probabiltiy model because Prob(y = 1|x) = α + βx

How well did you know this?

Not at all

Perfectly

Advantages and disadvantages with LPM?

Advantages:
Simple to estimate and interpret
Inference the same as for multiple regression (note: LPM is inherently heteroskedastic)

Disadvantages:
A LPM says that the change in the predicted probability for a given change in x is the same for all values of x, but that does not always make sense
Also, LPM predicts probabilities that can be < 0 and > 1
Overall:
we need a non-linear model: probit and logit regression

How well did you know this?

Not at all

Perfectly

Why Cumulative normal regression?

The “S-shape” gives us:
Marginal probabilities depend on the value of x
Predicted probabilities are bounded between 0 and 1

Easy to use
Relative straight forward interpretation:
z = β0 + β1x1 + · · · + βk xk
β1 is the change in the z-value of a unit change in x1 holding everything else constant

How well did you know this?

Not at all

Perfectly

How do we interpret the marginal effects in this chapter?

Marginal effect: the effect on the dependent variable that results from changing an independent variable by a small amount
Probit and logit are non-linear functions
Hence, the ultimate effect of one unit change in a regressor (x) on predicted probabilities is different from different values of x
It is common to report marginal (or partial) effects instead of coefficients
Marginal effect at the means I Average marginal effect

How well did you know this?

Not at all

Perfectly

Maximum Likelihood

The likelihood function is the conditional density of y1, . . . , yn given x1, . . . , xn treated as a function of the unknown parameters α, β 
Maximizing the probability of observing the data given the assumed model 
For probit (with one explanatory variable):

The maximum likelihood estimator (MLE) is the value of α, β1, . . . , βk that maximinize the likelihood function
The MLE is the value of α, β1, . . . , βk that best describe the full distribution of the data
In large samples, MLE is: consistent I normally distributed, efficient (has the smallest variance of all estimators)

Measures of Fit
The R 2 and Adjusted R2 do not make sense here (even for LPM). So, two other specialised measures are used
The fraction correctly predicted = fraction of y’s for which the predicted probability is > 50% when yi is 1, or is < 50% when yi is 0

How well did you know this?

Not at all

Perfectly

Hypothesis testing

Usual t-tests and confidence intervals can be used by testing h0: b=0.

For joint hypothesis test, use likelihood ratio test that compares likelihood functions of the restricted and unrestricted models. For joint hypothesis, there is for example H0: B0 = 0, B1 = 0, several ones you test. Because there is several, you should use F-statistics because it is designed for testing several at a time. You run one regression under the null hypothesis. You then compare the fits of this regression – its R2 – to the fitted one. If the unrestricted fits better, reject h0.

How well did you know this?

Not at all

Perfectly

Selecting a Variable

1.Think of an omitted causal effect that could result in omitted bias. Fix the bias if possible. This can be done by: 1) include them if it does not violate the assumptions, 2) add control variables.

what is a stochastic process

sequence of random variables indexed by time

the method used to estimate probit and logit models

Maximum Likelihood Estimation (MLE)

The models are nonlinear in the coefficients, so they can’t be estimated by OLS.

likelihood function

The likelihood function is the joint probability distribution of the data, treated as a function of the unknown coefficients.

maximum likelihood estimator (MLE)

The maximum likelihood estimator (MLE) are the values of the coefficients that maximize the likelihood function.

MLE’s are the parameter values “most likely” to have produced the data.

Disadvantages of the linear probability model

Predicted probability can be above 1 or below 0!

- Error terms are heteroskedastic

Disadvantages of the linear probability model

Predicted probability can be above 1 or below 0!

- Error terms are heteroskedastic

What is Maximum Likelihood?

The method of maximum likelihood is an alternative way to generate estimates of the unknown parameters. It begins by making an assumption about the distribution of the errors.

Y{i}=α + βX{i} + u{i}

u{i}~N(0,σ{u}^2) E(u{i},u{j}) = 0 ∀ i≠j

The errors are assumed to be independent, identically distributed (iid),normal random variables - if data the data collected is (iid) then it is said to be a random sample
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model given observations, by finding the parameter values that maximize the likelihood of making the observations given the parameters
e.g. if we had a set of data which is normally distributed - what values of μ and σ2, is most likely responsible for creaying the data points that we observed

When Y is the binary variable -> explain the regression

The population regression function shows the probability that Y = 1 given the value of the regressors

Why is it called LPM

Because the probability that Y = 1 is a linear function of the regressors

What is Probit and Logit regressions

They are regression models that are nonlinear when Y is used as a binary variable

What is Probit and Logit regressions

They are regression models that are nonlinear when Y is used as a binary variable

Difference between LPM and Probit & Logit

Probit & Logit regressions ensure that the predicted probability will be between 0 and 1

Probit Regression uses .....

Cumulative Distribution function

What is cumulative distribution function

It is the probability that the variable takes a value less than or equal to X

What does Probit and Logit allow for that LPM doesnt

Probit and Logit models allows for non-linear relationship between regressors and dependent variable.

Logit Model uses _________

Logistic cumulative distribution function

Logit and Probit Models are appropriate when attempring to model ___

a dichotomus dependent variable, e.g. yes/no, agree/disagree, like/dislike.

How does the Probit and Logit model look like? Shape

S-Shape, y is between 0 and 1

what is Logistic cumulative distribution function

The logistic distribution has slightly heavier tails (more kurtosis), not quite as heavy as t-distribution

Logit regression assumptions

1. does nor require a linear relationship between indep and dep 2. error terms do not need to be normally distributed 3. homosced is not required 4. logt reg is not measured on an interval or ratio scale So in short: not like the OLS assumptions