regression with a binary dependent variable Flashcards

Question 1

Q

when Y is binary what is the linear regression model called and why?

Answer

A

it is called the linear probability model because pr(Y=1|X) = B0+B1X

Question 2

Q

what is the predicted value for the linear probability model?

Answer

A

the predicted value is a probability

Question 3

Q

what is B1 equal to in the linear probability model?

Answer

A

B1= the difference in probability that Y =1 associated with a unit difference in x

Question 4

Q

what is the formula for B1 in the linear probability model?

Answer

A

B1= [Pr(Y=1|X=x+change in x) -Pr(Y=1|X=1)]/change in X

Question 5

Q

what are the advantages of the linear probability model>

Answer

A

simple to estimate and interpret
inference is the same as for multiple regression ( need heteroskedacity-robust standard errors)7

Question 6

Q

what are the disadvantages of the linear probability model>

Answer

A

a LPM says that the changes in the predicted probability for a given change in X is the same for all values of X but that doesnt make sense
also LPM predicted probabilities can be <0 or >1

Question 7

Q

how can the disadvantages of the linear probability model be solved?

Answer

A

the disadvantages can be solved by using a nonlinear probability model such as probit regression or logit regression

Question 8

Q

what is the probit regression?

Answer

A

the probit regression models the probability that Y=1 using the cumulative standard normal distribution function Φ(z), evaluated at z=B0 +B1X

Question 9

Q

what is the equation of the probit regression model?

Answer

A

Pr(Y=1|X) =Φ(B0+B1X) where Φ is the cumulative normal distribution and z=B0+B1X

Question 10

Q

why use the cumulative normal probability distribution?

Answer

A

it provides an S shape which gives us what we need: Pr(Y=1|X) is increasing in X for B1>0 and 0 ≤Pr(Y=1|X)≤1 for all X
it is also easy to use as the probabilies are tabulated in the cumulative normal tables
it also has a relatively straightforward interpretation - B1 is the change in Z value for a unit change in X

Question 11

Q

what is the equation of the probit regression with multiple regressors?

Answer

A

Pr(Y=1|X1,X2) =Φ(B0+B1X+B2X2) where Φ is the cumulative normal distribution and z=B0+B1X1+B2X2

Question 12

Q

what is the B1 for probit regression with multiple regressors?

Answer

A

β1 is the effect on the z-score of a unit change in X1, holding constant X2 (when a causal interpretation is justified)

Question 13

Q

what is the logit regression model?

Answer

A

Logit regression models the probability of Y = 1, given X, as the cumulative standard logistic distribution function

Question 14

Q

what is the equation of the logit regression model>

Answer

A

Pr(Y=1|X ) = F(β0+β1X)
where F is the cumulative logistic distribution function:
F(β0+β1X) = 1/ (1+e^[-(B0+B1X)

Question 15

Q

how is the non linear least squares different to the OLS?

Answer

A

the non linear least squares extends the idea of the OLS to models in which the parameters enter nonlinearly

Question 16

Q

what is the minimisation problem for the non linear Least sqaures

Answer

Study These Flashcards

A

min_b0,b1 Σ[Y_i - Φ(β0 + β1X_i)]^2

Question 17

Q

how do we solve the minimisation problem for the nonlinear least squares?

Answer

Study These Flashcards

A

calculus doesnt give us an answer
it is solved numerically by a computer

Question 18

Q

what is the likelihood function?

Answer

Study These Flashcards

A

the likelihood function is the conditional density of Y1,..,Yn given X1,….,Xn treated as a function of unknown parameters B0 and B1

Question 19

Q

what is the maximum likelihood estimator (MLE) ?

Answer

Study These Flashcards

A

MLE is the value of (B0,B1) that maximises the likelihood function. it is the value which best describes the full distribution of data

Question 20

Q

in large samples, what is the maximum likelihood estimator?

Answer

Study These Flashcards

A

it is consistent, normally distributed and efficient (has the smallest variance of all consistent estimators)

Question 21

Q

what are the measures of fit for logit and probit?

Answer

Study These Flashcards

A

1) the fraction correctly predicted = fraction of Ys for which the predicted probability is >50% when Y_i=1 or is <50% when Y_i=0
2) the pseudo R^2 measures the improvement in the value of the log likelihood, relative to having no X’s. the pseudo R^2 simplifies to the R^2 in the linear model with normally distributed errors

Question 22

Q

in large samples, what are the features of the probit likelihood with one X?

Answer

Study These Flashcards

A

estimator of B0_MLE and B1_MLE are consistent, normally distributed and asymptotically efficient

regression with a binary dependent variable Flashcards

(22 cards)