Lecture 14 Flashcards
LDV models refer to those where
the dependent variable’s range is restricted:
- binary response, e.g. Probit, Logit
- Censored
- Tobit
- Truncated
- Sample Selection
Binary response models
Used when the dependent variable takes only 2 possible values
Censored models
Used when some values of Y are only partially observed
Tobit models
Used when the dependent variable is continuous, but there’s a threshold below which all values are reported as the same
- so spending on luxury goods, either 0 or the actual spending.
Sample Selection Models
Used when the sample might not be randomly selected
- e.g. studying wages, but only having data for people who choose to work
- need to adjust for bias introduced by non-random selection
Main goals when studying binary response models like Probit and Logit
- Develop better models for binary outcomes, predict the probability someone does something - Pr(y=1|x), using Logit or Probit
- justify with economic theory, leading to a latent utility model
- Develop Estimation Methods, OLS doesn’t work except for linear probability so use NLS or MLE
- Interpret coefficients so how a one unit change in x affects the probability that y=1
- Generalise F tests
Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:
Someone’s decision to work
- utility function and components.
- let y = 1 if they work, y = 0 if they dont
- person works if utility from working is greater than not working
U(y; x, ey) = ByT.x + ey - where x are observed characteristics which might affect one’s preference for work
- By are parameters which quantify how each characteristic affects utility
- ey represents un observed taste shifters like motivation.
Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:
Someone’s decision to work
- making the decision
Person will only work (y=1) iff:
- u(1;x,e1) > u(0;x,e0)
=> B1T.x + e1 > B0T.x + e0
=> (B1-B0)T.x + (e1-e0) > 0
SO, y=1 if => BT.x + e > 0, y = 0 otherwise
- since e is unobserved we make assumptions about its distribution to estimate the model.
Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:
Someone’s decision to work
- Logit and Probit, i.e. MLR, i.e. normal or logistic distribution
Probit: assume e - N(0,1),
then G(z) = integral((1/(2pi)^-.5))e^-(0.5u^2).du) limits z and negativity infinity
Logit: assume e - Logistic (0,1)
Then G(z) = ((e^z)/(1 + e^z))
Both are bell shaped and symmetric, so you get similar qualitative results, but the tails differ slightly
Compute the PDF of the Logit function
- u need this for computing marginal effects and performing MLE estimation
G(z)(1-G(z))
- PDF is symmetric around 0 and bell shaped,
- this also applies to the Probit model
Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:
Someone’s decision to work
- probability and estimation
- back to the general model
Using the symmetry of G(z), we get:
- P(y=1|x) = p(e>-BT.x) = 1 - G(-BT.x) = G(BT.x)
- can now estimate B using MLE, but OLS wont work as models is nonlinear
- this symmetry applies to both Logit and Probit
MLE using a simple binary example:
- y=1 with probability p, y=0 with probability 1-p
- lets say we observed: y1 = 0, y2 = 1, y3 = 0, independent draws from a Bernoulli distribution with unknown probability p
Use the likelihood function:
L(p) = ((1-p)^2)(p)
- maximise with respect to p, p = 1/3
- can also turn into log likelihood function and maximise to get p = 1/3
Core idea of ML is to find p which makes the observed data most likely.
Likelihood for Binary Respone:
- we want to estimate parameters B for a binary model, y - {0,1}
Given the (PDF), likelihood function f(yi|xi), if yi = 1, expression becomes G(BT.x) and if y = 0, it becomes 1 - G(BT.x)
- this can work for either Logit or Probit
Estimate B by maximising L(B) = SUM(li(B)), with respect to B, this is MLE
- estimator is consistent, asymptotically normal and efficient.
3 standard methods for testing Multiple Exclusion Restrictions in models like Logit or Probit
- Lagrange Multiplier, or Score Test
- Wald Test
- Likelihood Ratio Test
Lagrange Multiplier, or Score Test
- what it does
- key feature
- why useful
- test stat distribution
- tests whether adding extra parameters would improve the fit of a restricted model
- only estimate the restricted model, i.e. the one without excluded variables
- efficient when you’re testing if variables are needed before estimating the bigger models
- H0, restrictions are valid, test follows chi squared distribution
Wald Test
- what it does
- key feature
- why useful
- test stat distribution
- tests whether estimated parameters in the unrestricted model are statistically 0
- only estimate the unrestricted model
- common in regression packages, quick and based on SEs from full model
- may perform poorly if the null is near the boundary of the parameter space or the sample is small
LR test
- what it does
- key feature
- why useful
- test stat distribution
- compares the fir of restricted vs unrestricted models
- have to estimate both models
- chi squared distribution.
Coefficient Bj in Logit or Probit issue
What we model is:
- Pr(y=1|x) = G(BT.x)
- but BT.x is actually interpreted as the expected value of a latent variable y*, which itself is not observed
- means the coefficients Bj do not directly tell you the marginal effects of xj on prob y being 1
Coefficient Bj in Logit or Probit
- discrete vs continuous
If discrete: cant take a derivative as the variable jumps
- so instead compute difference in predicted probabilities when variable switches from 0 to 1
- Effect = G(with x=1) - G(with x=0), where G is the CDF
If continuous:
- Dp(x)/Dxj = g(BT.x).Bj
- g is the PDF - the slope of the CDF, so this tells you how much does the probability of y=1 change when you increase xj slightly
Key remarks on partial effects
- Generalisation to other variables, like nonlinear transformations
- Can be generalised for general functional forms with no nonlinearities
- Elasticities can be calculated
- Careful with interactions of variables.
PEA - Partial Effect at the Average
Dp(E{x})/Dxj = g(B^T.x_).Bj^
- plug in the average value of x into the marginal effect formula
- simple, easy to interpret
- what does it even mean to be 48% female?
- for nonlinear models, E[f(x)} does not = f(E[x]), so may not reflect reality well
APE - Average Partial Effect
E[Dp(x)/Dxj] = Bj^.1/n(SUM(g(B^T.x))
- compute the marginal effect for each individual and then average across your sample
- more representative of the sample
PEA or APE?
Use PEA if you want simplicity, use APE for more accurate average marginal effects.
In binary models, regular R^2 isn’t meaningful, few alternatives
- percent correctly predicted
- fraction of successes in sample
- pseudo R^2
- Predict y^ = 1 if G(^) >/ 0,5, otherwise, 0, then compare predicted y to actual y, calculation proportion of correct predictions
- Sometimes sample has very few 1s, can adjust threshold instead of 0.5 to match fraction of successes in sample
- = 1 - ((lnLur)/(lnL0)), lnLur is log likelihood from the model with predictors, L0 is the log-likelihood from the null model
Digging deeper into Pseudo R^2
- efron
- makes sense as log likelihoods are negative, and a better model means higher log likelihood, since lnLur > lnL0, value here is less than 1
- alternative:
1 - (SUM.(pi^ - yi)^2/SUM.(yi - y_)^2), where pi^ is the estimate probability that y = 1 for observation i.
What is censoring?
- when may it occur?
Censoring refers to situations in regression models where the dependent variable is one partially observed
- top coding: income reported as 100k+
- duration models: might only know someone hasn’t yet experienced the events like death - right censoring
- attrition: a survey respondent drops out, you know they were still employed last time but not what happened next
Particular model with censoring is thus:
- y* = Bt.x + u, u|x, c - N(0,o^2)
- y = min(c,y*)
If y* > c, we only observe y, this is right-censoring
If y* < c, we observe the actual y*
- we know whether the true outcome was above or below, but not the exact value if its censored.
- also is an analogous left-censoring version with y = max(c,y*), like when time can’t be below 0
Particular model with censoring is thus:
- y* = Bt.x + u, u|x, c - N(0,o^2)
- y = min(c,y*)
How to construct the likelihood function for this censored model
Pr(y = c|x) = pr (y* >/ c|x) = pr( u >/ c - Bt.x |x)
= 1- CDF
Likelihood contributions, two cases:
1. Uncensored data, so if y<c, you use the PDF of the normal distribution: f(y|x) = PDF of normal distribution
2. Censored data, so if y = c, use complement of the CDF
Can log both cases and then maximise this log-likelihood to estimate coefficients.
Tobit model example:
- we assume there’s an unobserved variable y* = Bt.x + u that reflects someone’s true inclination to work, but we only observe:
- y = max(c,y)
So if y > c, we observe y*, otherwise we observe c, usually c = 0
- if someone wants to work a positive amount, we observe that
- if not we observe a corner solution, y = 0
A left censoring problem
Tobit model example:
- we assume there’s an unobserved variable y* = Bt.x + u that reflects someone’s true inclination to work, but we only observe:
- y = max(c,y)
LIKELIHOOD CONSTRUCTION
- Probability of being censored (i.e., y = 0)
- Pr( y = 0|x) = Pr( y* < 0|x) = Pr (u < -Bt.x) = 1 - CDF - Density when y > 0, uncensored, a standard normal density shifted by Bt.x, scaled by 1/o
- For the likelihood function, when y = 0, we use the complement of the CDF, when y > 0, we use the normal density, then MLE this
Comparing Tobit and censoring models
- similarities and differences
Similarities:
- both involve combining censored and uncensored observations and estimating with MLE
Differences:
- in censoring models, y* is the variable of interest, model is linear and Bj directly reflects the partial effect on y*
- in Tobit, y* is just a modelling tool, not the focus of analysis, actual y is the one we cater about, corner solution is a meaningful outcome, thus Bj does not directly give the marginal effect on y
Want to compute the expected value of y given x in the Tobit model
- y* = Bt.x + u, u is normally distributed
- y = max(0,y*)
E(y|x) = Pr(y>0|x).E(y|x,y>0) + Pr(y=0|x).0
= Pr(y>0|x).E(y|x,y>0)
- z = xt.B/o
SO E(y|x) = CDF.xt.B + o.PDF
- so Bj does not give the maringal effect on y, like in OLS
INVERSE MILLS RATIO:
- E(y|x) = CDF.(xt.B + ok(z)), where k(z) is the inverses mills ratio
Two types of partial effects in the Tobit model
Prove these:
- DE(y|x,y > 0)/Dxj - the conditional partial effect, tells us how xj changes y among individuals for whom we actually observe y > 0
- DE(y|x)/Dxj = unconditional partial effect, adjusts the latent effect Bj by the probability of being uncensored
Limitations of the Tobit model
- Tobit models ties together the probability of being above the censoring threshold and the expected value conditional on being above it, via the same parameter Bj, so any variable xj must affect both outcomes in the same direction