HANDOUT 12 Flashcards
4 different names for models when Y is not continuous
- Limited dependent variable models
- Binary choice models
- Dummy dependent variable
- Qualitative choice
What is the observed variable?
Yi = 1 if vote Yi = 0 if do not vote
What is the latent variable?
Yi* = net-utility from undertaking the activity
Y* =
Y* = Xi’B + €i in short form
What is the problem with Y* the latent variable?
It is UNOBSERVED
- we do not know an individual’s net utility from undertaking an action
How do we related Yi and Yi*?
Yi = 1 if Yi* >=0 Yi = 0 if Yi* < 0
E(yi) based on bernoulli trial
E(yi) = p(Yi = 1) = pi
V(Yi) based on bernoulli trial
V(Yi) = pi (1-pi) = P(Yi = 1) x P(Yi = 0)
How can we rewrite E(Yi)?
E(Yi) = P(Yi=1) = P(Yi* >=0) = P(Xi’B + €i >=0)
= P(€i >= -Xi’B) = P(€i <= Xi’B) = F(Xi’B)
Distribution of €i
Normal distribution - symmetric
F(Xi’B) refers to
The cumulative distribution function - probability of being less than or equal to Xi’B under the distribution of €i
Our Model in 2 equations
- E(yi) = F(Xi’B)
- Yi = E(Yi) + Ui
- this is always the case: y = its expected value + some error term
What is F in a linear probability model?
F = a UNIFORM distribution F(Xi'B) = U(L, U)
3 facts about uniform distribtuion
- centered at zero
- distributed between lower and upper limit
- all shocks equally likely
Under a uniform distribution, what is F(Xi’B0?
F(Xi’B) = Xi’B
Therefore, what is our model for LPM & how do we estimate it?
E(Yi) = F(Xi’B) = Xi’B
So: Yi = Xi’B + Ui
Estimate by usual OLS
Unless Xi endogenous –> IV estimation
Interpret coefficient on X1 under LPM
B1 = change in P(Y=1) for a unit increase in X1, ceteris paribus.
100B1 under LPM=
100B1 = percentage point change in P(Y=1) for 1 unit increase in X1
B1 if X1 is a dummy variable under LPM
B1 = change in P(Y=1) for having the characteristic vs not having it, ceteris paribus
3 advantages of LPM
- easy to estimate - OLS
- easy to interpret coefficients
- easy to solve endogeneity issue - IV
3 problems with LPM
- Ui is not normal
- Ui is heteroscedastic
- Pi is NOT bounded [0, 1]
Why is Ui not normal under LPM?
If Yi=0, Ui = -Xi’B
If Yi = 1, Ui = 1 - Xi’B
Only takes 2 values = cannot be normal
Is non-normality of Ui under LPM an issue?
NO - invoke CLT if n>=30
coefficients approx normal = do z tests and chi-squared tests
V(Ui) =
V(Ui) = Xi’B(1 - Xi’B)
- Depends on i = heteroscedastic
Is heteroscedasticity of Ui under LPM an issue?
NO - just use robust standard errors.
Is Pi not bounded under LPM an issue?
YES - not well-defined
Pi = Xi’B - we cannot bound this between 0 and 1.
Logit model what is F + formula
F = logistic distribution F(.) = e^. / (1 + e^.)
Are probabilities bounded for logistic distribution?
YES - as Xi’B to infinity, F –>1
As Xi’B –> -infinity, F –> 0
As Xi’B –> 0, F–>1/2
How do we estimate a Logit model?
MAXIMUM LIKELIHOOD ESTIMATION
What does maximum likelihood estimation do? Coin flipping example.
Suppose we flip a coin 30 times and observe 18 heads. We then try to find the p(head) that maximises the chance of what we observed. Repeated bernoulli trials = binomial P(X=18) = 30C18 x p^18 x (1 - p)^12 max w.r.t p we get p* = 18/30 = 0.6
If the sample is random, how can we write joint probabilities?
Just multiple the individual probabilities together
P(A n B) = P(A) x P(B)
Denote the joint density function as the likelihood function
L(.) = Pi i=1,…,n [F(Xi’B)]^Yi [1 - F(Xi’B)]^1-Yi
Take logs of likelihood function
ln(L(.)) = sum [yi ln(F(Xi’B))] + [(1-Yi) ln(1 - F(Xi’B))]
Simplified log likelihood function form for logit model
ln(L(.)) = sum i=1,..,n1 Xi’B – sum i=1,..,n
ln(1 + e^Xi’B)
How does stata maximise the log likelihood function?
It partially differentiates the ln(L(.)) w.r.t beta and sets = 0. We find a unique solution for beta, but we cannot write a simply algebraic expression since the function is non-linear.
F for a PROBIT model + formula
F = a NORMAL distribution
F = integral between –infinity & Xi’B/sigma
(2Pi)^-0.5 exp(-Z^2/2) dZ
What do we assume about sigma for probit model?
Assume sigma = 1
So we can estimate Beta and not just Beta/sigma.
Log likelihood function for probit model
ln(L(.)) = sum [yi ln(Phi(Xi’B))] + [(1-yi) ln(1 - Phi(Xi’B))]
Logit vs probit distributions
Logit = logistic distribution
Probit = normal distribution
- Very similar CDFs, but logistic is flatter in the tails.
partial derivative of E(Yi) w.r.t X1 for logit
dE(Yi)/dX1 = dCDF(Z)/Z x dZ/dX1 Z = B0 + B1X1i + B2X2i dZ/dX1 = B1
Derivative of CDF =
How does the PDF differ near/away from mean?
Near mean: PDF (=slope of CDF) = large
At extremes: PDF = small
What is the PDF for a logit?
PDF of a logistic = CDF x (1 - CDF)
PDF = e^z / (1 + e^z)^2
Can we interpret B1?
NO - only a scaled version of B1
B1 x PDF = change in P(Y=1) for unit increase in X1.
Impact of a dummy variable on CDF
Dummy variable = vertical displacement of CDF.
ME of a dummy variable =
difference between 2 CDFs at a certain value of X1.
How does ME of a dummy differ across distribution?
Near mean of X1 = larger ME
At extremes = smaller ME
PDF of a probit model
phi(Z) = (2Pi)^-0.5 exp(-Z^2 / 2)
- Take CVs from standard normal table
ME depends on i so how do we interpret?
Interpret at MEAN VALUES
3 properties of MLE estimator of bj
- consistent
- asymptotically normal
- most efficient
What test do we do for 1 restriction?
Approx. Z test
What test do we do for multiple restrictions? What is DOF?
Chi-squared k
K = DOF = number of restrictions we are testing.
For a multiple restriction test, what is the test statistic formula?
LR =2[ln(Lu) - ln(LR)]
Lu = log likelihood of unrestricted model
LR = log likelihood of restricted model
R^2 formula for log likelihood. Why is it bad?
R^2 = 1 - [ln(Lw) / ln(L0)]
Where L0 = log likelihood if we only have an intercept. Bad as has no natural interpretation.
Describe goodness of fit tests
Yi^ = 1 if E(yi) > 0.5 & 0 otherwise
Compare predicted and actual Y - Yi ≠ Yi^ due to Ui random shocks.
Look at proportion of total correctly predicted.
Goodness of fit test if we adopt a simple/constant probability rule.
Yi^ = 1 if p > 0.5 & 0 otherwise
p = sample proportion
- We predict everyone to vote leave if the proportion voting leave > 0.5 across whole sample.
- Compare this to E(yi) one and see the gain from using the model.