Lecture 14 Flashcards

Question 1

Q

LDV models refer to those where

Answer

A

the dependent variable’s range is restricted:
- binary response, e.g. Probit, Logit
- Censored
- Tobit
- Truncated
- Sample Selection

Question 2

Q

Binary response models

Answer

A

Used when the dependent variable takes only 2 possible values

Question 3

Q

Censored models

Answer

A

Used when some values of Y are only partially observed

Question 4

Q

Tobit models

Answer

A

Used when the dependent variable is continuous, but there’s a threshold below which all values are reported as the same
- so spending on luxury goods, either 0 or the actual spending.

Question 5

Q

Sample Selection Models

Answer

A

Used when the sample might not be randomly selected
- e.g. studying wages, but only having data for people who choose to work
- need to adjust for bias introduced by non-random selection

Question 6

Q

Main goals when studying binary response models like Probit and Logit

Answer

A

Develop better models for binary outcomes, predict the probability someone does something - Pr(y=1|x), using Logit or Probit
justify with economic theory, leading to a latent utility model
Develop Estimation Methods, OLS doesn’t work except for linear probability so use NLS or MLE
Interpret coefficients so how a one unit change in x affects the probability that y=1
Generalise F tests

Question 7

Q

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- utility function and components.

Answer

A

let y = 1 if they work, y = 0 if they dont
person works if utility from working is greater than not working
U(y; x, ey) = ByT.x + ey
where x are observed characteristics which might affect one’s preference for work
By are parameters which quantify how each characteristic affects utility
ey represents un observed taste shifters like motivation.

Question 8

Q

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- making the decision

Answer

A

Person will only work (y=1) iff:
- u(1;x,e1) > u(0;x,e0)
=> B1T.x + e1 > B0T.x + e0
=> (B1-B0)T.x + (e1-e0) > 0
SO, y=1 if => BT.x + e > 0, y = 0 otherwise
- since e is unobserved we make assumptions about its distribution to estimate the model.

Question 9

Q

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- Logit and Probit, i.e. MLR, i.e. normal or logistic distribution

Answer

A

Probit: assume e - N(0,1),
then G(z) = integral((1/(2pi)^-.5))e^-(0.5u^2).du) limits z and negativity infinity
Logit: assume e - Logistic (0,1)
Then G(z) = ((e^z)/(1 + e^z))

Both are bell shaped and symmetric, so you get similar qualitative results, but the tails differ slightly

Question 10

Q

Compute the PDF of the Logit function
- u need this for computing marginal effects and performing MLE estimation

Answer

A

G(z)(1-G(z))
- PDF is symmetric around 0 and bell shaped,
- this also applies to the Probit model

Question 11

Q

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- probability and estimation
- back to the general model

Answer

A

Using the symmetry of G(z), we get:
- P(y=1|x) = p(e>-BT.x) = 1 - G(-BT.x) = G(BT.x)
- can now estimate B using MLE, but OLS wont work as models is nonlinear
- this symmetry applies to both Logit and Probit

Question 12

Q

MLE using a simple binary example:
- y=1 with probability p, y=0 with probability 1-p
- lets say we observed: y1 = 0, y2 = 1, y3 = 0, independent draws from a Bernoulli distribution with unknown probability p

Answer

A

Use the likelihood function:
L(p) = ((1-p)^2)(p)
- maximise with respect to p, p = 1/3
- can also turn into log likelihood function and maximise to get p = 1/3
Core idea of ML is to find p which makes the observed data most likely.

Question 13

Q

Likelihood for Binary Respone:
- we want to estimate parameters B for a binary model, y - {0,1}

Answer

A

Given the (PDF), likelihood function f(yi|xi), if yi = 1, expression becomes G(BT.x) and if y = 0, it becomes 1 - G(BT.x)
- this can work for either Logit or Probit
Estimate B by maximising L(B) = SUM(li(B)), with respect to B, this is MLE
- estimator is consistent, asymptotically normal and efficient.

Question 14

Q

3 standard methods for testing Multiple Exclusion Restrictions in models like Logit or Probit

Answer

A

Lagrange Multiplier, or Score Test
Wald Test
Likelihood Ratio Test

Question 15

Q

Lagrange Multiplier, or Score Test
- what it does
- key feature
- why useful
- test stat distribution

Answer

A

tests whether adding extra parameters would improve the fit of a restricted model
only estimate the restricted model, i.e. the one without excluded variables
efficient when you’re testing if variables are needed before estimating the bigger models
H0, restrictions are valid, test follows chi squared distribution

Question 16

Q

Wald Test
- what it does
- key feature
- why useful
- test stat distribution

Answer

A

tests whether estimated parameters in the unrestricted model are statistically 0
only estimate the unrestricted model
common in regression packages, quick and based on SEs from full model
may perform poorly if the null is near the boundary of the parameter space or the sample is small

Question 17

Q

LR test
- what it does
- key feature
- why useful
- test stat distribution

Answer

A

compares the fir of restricted vs unrestricted models
have to estimate both models
chi squared distribution.

Question 18

Q

Coefficient Bj in Logit or Probit issue

Answer

A

What we model is:
- Pr(y=1|x) = G(BT.x)
- but BT.x is actually interpreted as the expected value of a latent variable y*, which itself is not observed
- means the coefficients Bj do not directly tell you the marginal effects of xj on prob y being 1

Question 19

Q

Coefficient Bj in Logit or Probit
- discrete vs continuous

Answer

A

If discrete: cant take a derivative as the variable jumps
- so instead compute difference in predicted probabilities when variable switches from 0 to 1
- Effect = G(with x=1) - G(with x=0), where G is the CDF
If continuous:
- Dp(x)/Dxj = g(BT.x).Bj
- g is the PDF - the slope of the CDF, so this tells you how much does the probability of y=1 change when you increase xj slightly

Question 20

Q

Key remarks on partial effects

Answer

A

Generalisation to other variables, like nonlinear transformations
Can be generalised for general functional forms with no nonlinearities
Elasticities can be calculated
Careful with interactions of variables.

Question 21

Q

PEA - Partial Effect at the Average

Answer

A

Dp(E{x})/Dxj = g(B^T.x_).Bj^
- plug in the average value of x into the marginal effect formula
- simple, easy to interpret
- what does it even mean to be 48% female?
- for nonlinear models, E[f(x)} does not = f(E[x]), so may not reflect reality well

Question 22

Q

APE - Average Partial Effect

Answer

A

E[Dp(x)/Dxj] = Bj^.1/n(SUM(g(B^T.x))
- compute the marginal effect for each individual and then average across your sample
- more representative of the sample

Question 23

Q

PEA or APE?

Answer

A

Use PEA if you want simplicity, use APE for more accurate average marginal effects.

Question 24

Q

In binary models, regular R^2 isn’t meaningful, few alternatives
- percent correctly predicted
- fraction of successes in sample
- pseudo R^2

Answer

A

Predict y^ = 1 if G(^) >/ 0,5, otherwise, 0, then compare predicted y to actual y, calculation proportion of correct predictions
Sometimes sample has very few 1s, can adjust threshold instead of 0.5 to match fraction of successes in sample
= 1 - ((lnLur)/(lnL0)), lnLur is log likelihood from the model with predictors, L0 is the log-likelihood from the null model

Question 25

Q

Digging deeper into Pseudo R^2
- efron

Answer

A

makes sense as log likelihoods are negative, and a better model means higher log likelihood, since lnLur > lnL0, value here is less than 1
alternative:
1 - (SUM.(pi^ - yi)^2/SUM.(yi - y_)^2), where pi^ is the estimate probability that y = 1 for observation i.

Question 26

Q

What is censoring?
- when may it occur?

Answer

A

Censoring refers to situations in regression models where the dependent variable is one partially observed
- top coding: income reported as 100k+
- duration models: might only know someone hasn’t yet experienced the events like death - right censoring
- attrition: a survey respondent drops out, you know they were still employed last time but not what happened next

Question 27

Q

Particular model with censoring is thus:
- y* = Bt.x + u, u|x, c - N(0,o^2)
- y = min(c,y*)

Answer

A

If y* > c, we only observe y, this is right-censoring
If y* < c, we observe the actual y*
- we know whether the true outcome was above or below, but not the exact value if its censored.
- also is an analogous left-censoring version with y = max(c,y*), like when time can’t be below 0

Question 28

Q

Particular model with censoring is thus:
- y* = Bt.x + u, u|x, c - N(0,o^2)
- y = min(c,y*)

How to construct the likelihood function for this censored model

Answer

A

Pr(y = c|x) = pr (y* >/ c|x) = pr( u >/ c - Bt.x |x)
= 1- CDF
Likelihood contributions, two cases:
1. Uncensored data, so if y<c, you use the PDF of the normal distribution: f(y|x) = PDF of normal distribution
2. Censored data, so if y = c, use complement of the CDF
Can log both cases and then maximise this log-likelihood to estimate coefficients.

Question 29

Q

Tobit model example:
- we assume there’s an unobserved variable y* = Bt.x + u that reflects someone’s true inclination to work, but we only observe:
- y = max(c,y)

Answer

A

So if y > c, we observe y*, otherwise we observe c, usually c = 0
- if someone wants to work a positive amount, we observe that
- if not we observe a corner solution, y = 0
A left censoring problem

Question 30

Q

Tobit model example:
- we assume there’s an unobserved variable y* = Bt.x + u that reflects someone’s true inclination to work, but we only observe:
- y = max(c,y)
LIKELIHOOD CONSTRUCTION

Answer

A

Probability of being censored (i.e., y = 0)
- Pr( y = 0|x) = Pr( y* < 0|x) = Pr (u < -Bt.x) = 1 - CDF
Density when y > 0, uncensored, a standard normal density shifted by Bt.x, scaled by 1/o
For the likelihood function, when y = 0, we use the complement of the CDF, when y > 0, we use the normal density, then MLE this

Question 31

Q

Comparing Tobit and censoring models
- similarities and differences

Answer

A

Similarities:
- both involve combining censored and uncensored observations and estimating with MLE
Differences:
- in censoring models, y* is the variable of interest, model is linear and Bj directly reflects the partial effect on y*
- in Tobit, y* is just a modelling tool, not the focus of analysis, actual y is the one we cater about, corner solution is a meaningful outcome, thus Bj does not directly give the marginal effect on y

Question 32

Q

Want to compute the expected value of y given x in the Tobit model
- y* = Bt.x + u, u is normally distributed
- y = max(0,y*)

Answer

A

Question 33

Q

Two types of partial effects in the Tobit model

Answer

A

Prove these:
- DE(y|x,y > 0)/Dxj - the conditional partial effect, tells us how xj changes y among individuals for whom we actually observe y > 0
- DE(y|x)/Dxj = unconditional partial effect, adjusts the latent effect Bj by the probability of being uncensored

Question 34

Q

Limitations of the Tobit model

Answer

A

Tobit models ties together the probability of being above the censoring threshold and the expected value conditional on being above it, via the same parameter Bj, so any variable xj must affect both outcomes in the same direction