Lecture 14 Flashcards

1
Q

LDV models refer to those where

A

the dependent variable’s range is restricted:
- binary response, e.g. Probit, Logit
- Censored
- Tobit
- Truncated
- Sample Selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Binary response models

A

Used when the dependent variable takes only 2 possible values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Censored models

A

Used when some values of Y are only partially observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Tobit models

A

Used when the dependent variable is continuous, but there’s a threshold below which all values are reported as the same
- so spending on luxury goods, either 0 or the actual spending.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample Selection Models

A

Used when the sample might not be randomly selected
- e.g. studying wages, but only having data for people who choose to work
- need to adjust for bias introduced by non-random selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Main goals when studying binary response models like Probit and Logit

A
  1. Develop better models for binary outcomes, predict the probability someone does something - Pr(y=1|x), using Logit or Probit
  2. justify with economic theory, leading to a latent utility model
  3. Develop Estimation Methods, OLS doesn’t work except for linear probability so use NLS or MLE
  4. Interpret coefficients so how a one unit change in x affects the probability that y=1
  5. Generalise F tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- utility function and components.

A
  • let y = 1 if they work, y = 0 if they dont
  • person works if utility from working is greater than not working
    U(y; x, ey) = ByT.x + ey
  • where x are observed characteristics which might affect one’s preference for work
  • By are parameters which quantify how each characteristic affects utility
  • ey represents un observed taste shifters like motivation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- making the decision

A

Person will only work (y=1) iff:
- u(1;x,e1) > u(0;x,e0)
=> B1T.x + e1 > B0T.x + e0
=> (B1-B0)T.x + (e1-e0) > 0
SO, y=1 if => BT.x + e > 0, y = 0 otherwise
- since e is unobserved we make assumptions about its distribution to estimate the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- Logit and Probit, i.e. MLR, i.e. normal or logistic distribution

A

Probit: assume e - N(0,1),
then G(z) = integral((1/(2pi)^-.5))e^-(0.5u^2).du) limits z and negativity infinity
Logit: assume e - Logistic (0,1)
Then G(z) = ((e^z)/(1 + e^z))

Both are bell shaped and symmetric, so you get similar qualitative results, but the tails differ slightly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Compute the PDF of the Logit function
- u need this for computing marginal effects and performing MLE estimation

A

G(z)(1-G(z))
- PDF is symmetric around 0 and bell shaped,
- this also applies to the Probit model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- probability and estimation
- back to the general model

A

Using the symmetry of G(z), we get:
- P(y=1|x) = p(e>-BT.x) = 1 - G(-BT.x) = G(BT.x)
- can now estimate B using MLE, but OLS wont work as models is nonlinear
- this symmetry applies to both Logit and Probit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MLE using a simple binary example:
- y=1 with probability p, y=0 with probability 1-p
- lets say we observed: y1 = 0, y2 = 1, y3 = 0, independent draws from a Bernoulli distribution with unknown probability p

A

Use the likelihood function:
L(p) = ((1-p)^2)(p)
- maximise with respect to p, p = 1/3
- can also turn into log likelihood function and maximise to get p = 1/3
Core idea of ML is to find p which makes the observed data most likely.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Likelihood for Binary Respone:
- we want to estimate parameters B for a binary model, y - {0,1}

A

Given the (PDF), likelihood function f(yi|xi), if yi = 1, expression becomes G(BT.x) and if y = 0, it becomes 1 - G(BT.x)
- this can work for either Logit or Probit
Estimate B by maximising L(B) = SUM(li(B)), with respect to B, this is MLE
- estimator is consistent, asymptotically normal and efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

3 standard methods for testing Multiple Exclusion Restrictions in models like Logit or Probit

A
  1. Lagrange Multiplier, or Score Test
  2. Wald Test
  3. Likelihood Ratio Test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Lagrange Multiplier, or Score Test
- what it does
- key feature
- why useful
- test stat distribution

A
  • tests whether adding extra parameters would improve the fit of a restricted model
  • only estimate the restricted model, i.e. the one without excluded variables
  • efficient when you’re testing if variables are needed before estimating the bigger models
  • H0, restrictions are valid, test follows chi squared distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Wald Test
- what it does
- key feature
- why useful
- test stat distribution

A
  • tests whether estimated parameters in the unrestricted model are statistically 0
  • only estimate the unrestricted model
  • common in regression packages, quick and based on SEs from full model
  • may perform poorly if the null is near the boundary of the parameter space or the sample is small
17
Q

LR test
- what it does
- key feature
- why useful
- test stat distribution

A
  • compares the fir of restricted vs unrestricted models
  • have to estimate both models
  • chi squared distribution.
18
Q

Coefficient Bj in Logit or Probit issue

A

What we model is:
- Pr(y=1|x) = G(BT.x)
- but BT.x is actually interpreted as the expected value of a latent variable y*, which itself is not observed
- means the coefficients Bj do not directly tell you the marginal effects of xj on prob y being 1

19
Q

Coefficient Bj in Logit or Probit
- discrete vs continuous

A

If discrete: cant take a derivative as the variable jumps
- so instead compute difference in predicted probabilities when variable switches from 0 to 1
- Effect = G(with x=1) - G(with x=0), where G is the CDF
If continuous:
- Dp(x)/Dxj = g(BT.x).Bj
- g is the PDF - the slope of the CDF, so this tells you how much does the probability of y=1 change when you increase xj slightly

20
Q

Key remarks on partial effects

A
  1. Generalisation to other variables, like nonlinear transformations
  2. Can be generalised for general functional forms with no nonlinearities
  3. Elasticities can be calculated
  4. Careful with interactions of variables.
21
Q

PEA - Partial Effect at the Average

A

Dp(E{x})/Dxj = g(B^T.x_).Bj^
- plug in the average value of x into the marginal effect formula
- simple, easy to interpret
- what does it even mean to be 48% female?
- for nonlinear models, E[f(x)} does not = f(E[x]), so may not reflect reality well

22
Q

APE - Average Partial Effect

A

E[Dp(x)/Dxj] = Bj^.1/n(SUM(g(B^T.x))
- compute the marginal effect for each individual and then average across your sample
- more representative of the sample

23
Q

PEA or APE?

A

Use PEA if you want simplicity, use APE for more accurate average marginal effects.

24
Q

In binary models, regular R^2 isn’t meaningful, few alternatives
- percent correctly predicted
- fraction of successes in sample
- pseudo R^2

A
  1. Predict y^ = 1 if G(^) >/ 0,5, otherwise, 0, then compare predicted y to actual y, calculation proportion of correct predictions
  2. Sometimes sample has very few 1s, can adjust threshold instead of 0.5 to match fraction of successes in sample
  3. = 1 - ((lnLur)/(lnL0)), lnLur is log likelihood from the model with predictors, L0 is the log-likelihood from the null model
25
Q

Digging deeper into Pseudo R^2
- efron

A
  • makes sense as log likelihoods are negative, and a better model means higher log likelihood, since lnLur > lnL0, value here is less than 1
  • alternative:
    1 - (SUM.(pi^ - yi)^2/SUM.(yi - y_)^2), where pi^ is the estimate probability that y = 1 for observation i.
26
Q

What is censoring?
- when may it occur?

A

Censoring refers to situations in regression models where the dependent variable is one partially observed
- top coding: income reported as 100k+
- duration models: might only know someone hasn’t yet experienced the events like death - right censoring
- attrition: a survey respondent drops out, you know they were still employed last time but not what happened next

27
Q

Particular model with censoring is thus:
- y* = Bt.x + u, u|x, c - N(0,o^2)
- y = min(c,y*)

A

If y* > c, we only observe y, this is right-censoring
If y* < c, we observe the actual y*
- we know whether the true outcome was above or below, but not the exact value if its censored.
- also is an analogous left-censoring version with y = max(c,y*), like when time can’t be below 0

28
Q

Particular model with censoring is thus:
- y* = Bt.x + u, u|x, c - N(0,o^2)
- y = min(c,y*)

How to construct the likelihood function for this censored model

A

Pr(y = c|x) = pr (y* >/ c|x) = pr( u >/ c - Bt.x |x)
= 1- CDF
Likelihood contributions, two cases:
1. Uncensored data, so if y<c, you use the PDF of the normal distribution: f(y|x) = PDF of normal distribution
2. Censored data, so if y = c, use complement of the CDF
Can log both cases and then maximise this log-likelihood to estimate coefficients.

29
Q

Tobit model example:
- we assume there’s an unobserved variable y* = Bt.x + u that reflects someone’s true inclination to work, but we only observe:
- y = max(c,y)

A

So if y > c, we observe y*, otherwise we observe c, usually c = 0
- if someone wants to work a positive amount, we observe that
- if not we observe a corner solution, y = 0
A left censoring problem

30
Q

Tobit model example:
- we assume there’s an unobserved variable y* = Bt.x + u that reflects someone’s true inclination to work, but we only observe:
- y = max(c,y)
LIKELIHOOD CONSTRUCTION

A
  1. Probability of being censored (i.e., y = 0)
    - Pr( y = 0|x) = Pr( y* < 0|x) = Pr (u < -Bt.x) = 1 - CDF
  2. Density when y > 0, uncensored, a standard normal density shifted by Bt.x, scaled by 1/o
  3. For the likelihood function, when y = 0, we use the complement of the CDF, when y > 0, we use the normal density, then MLE this
31
Q

Comparing Tobit and censoring models
- similarities and differences

A

Similarities:
- both involve combining censored and uncensored observations and estimating with MLE
Differences:
- in censoring models, y* is the variable of interest, model is linear and Bj directly reflects the partial effect on y*
- in Tobit, y* is just a modelling tool, not the focus of analysis, actual y is the one we cater about, corner solution is a meaningful outcome, thus Bj does not directly give the marginal effect on y

32
Q

Want to compute the expected value of y given x in the Tobit model
- y* = Bt.x + u, u is normally distributed
- y = max(0,y*)

A

E(y|x) = Pr(y>0|x).E(y|x,y>0) + Pr(y=0|x).0
= Pr(y>0|x).E(y|x,y>0)
- z = xt.B/o
SO E(y|x) = CDF.xt.B + o.PDF
- so Bj does not give the maringal effect on y, like in OLS
INVERSE MILLS RATIO:
- E(y|x) = CDF.(xt.B + ok(z)), where k(z) is the inverses mills ratio

33
Q

Two types of partial effects in the Tobit model

A

Prove these:
- DE(y|x,y > 0)/Dxj - the conditional partial effect, tells us how xj changes y among individuals for whom we actually observe y > 0
- DE(y|x)/Dxj = unconditional partial effect, adjusts the latent effect Bj by the probability of being uncensored

34
Q

Limitations of the Tobit model

A
  • Tobit models ties together the probability of being above the censoring threshold and the expected value conditional on being above it, via the same parameter Bj, so any variable xj must affect both outcomes in the same direction