L6 - Binary Choice Models Flashcards

1
Q

What is the linear equation for a Binary Choice Model?

A
  • the continuous scale of y* is not observed by the researcher
    • WE only observe the binary outcome
      • the decision-maker may see this continuum y*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the Binary choice model look like on a graph?

A
  • If your y* was observed we would have a linear regression model
  • This graph should the relationship between the y* and the probability that y=1 –> for a specific value of x
    • shaded –> Pr(y=1|X)
    • unshaded –> Pr(y=0|X)
  • This can be interpreted as
    • Pr ( y = 1| x) = Pr(y* > 0|X) –> NOT SURE IF RIHT
    • y* = α+βx+ε > 0
      • Moving α+βx to the right hand side and subbing in we get
    • Pr(y=1│x)=Pr(ε > -[α+βx] | x)
      • The probability depends on the distribution of the error ε
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is the relationship between the linear Binary Choice model linked to the nonlinear probability model?

A
  • Pr(y = 1 | x) = F(α + βx)
    • it is actually a cumulative distribution function of your error term
  • The S-shaped sigmoid curve relates to the shaded distribution functions of the linear model on the right –> for each individual value of x
    • That is because y* is not observed so we only have a distribution of error terms for each level of the explanatory variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is one important distinction between linear regression model and non-linear model to do with the variance of the error term?

A
  • One important distinction between linear regression models and non-linear models is that under OLS we can determine the variance of the error term
    • For discrete choice models, because the models reveal less information (you do not observe the continuum of y*) you have to assume the variance of the return –> this assumption will affect the shape of the distribution
      • e.g. fatter/thinner tails —> but wont effect the proportion of the shaded area
  • There we assume the variance of the error terms –> will only consider two types of models (Binary Probit and Binary Logit models)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does the normalisation of the error term affect the parameters of our Binary Choice equation?

A
  • Same as say setting the variance of the error term
  • So we have an original equation of utility
    • Say we multiply this equation by a constant λ
      • λ U0nj becomes U1nj
      • The variance of the error term is now multiplied by λ2
  • Given that Var(ε0nj )=σ2 is not observed how can I make this equal to 1 (in the case of the Probit model?)
    • divide the variance by σ2, given that σ22, then we must divide the original utility equation by σ, so that λ=(1/σ)
    • This will now give us a Binary-Probit model, but when we estimate the equation we will not estimate beta, but actually beta/σ –>, therefore, cannot directly interpret the magnitude of the estimated coefficients in this model –> as σ is still unknown
  • On the same line for Logit models –> if the variance of the error term is equal to roughly 1.6, we divide by σ2 and then multiple by π^2/6 = 1.6 . Therefore λ in the original equal equals –> sqrt(1.6)/σ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we test of well does a model fits the discrete choice model?

A
  • LL Ratio index
    • constructed by estimating the model with no parameters in it just with a constant term and extracting the LL function of that and then subtracting the LL of our full model and multiplying of -2
      • If greater than the chi-squared statistics we reject the null that our parameters are jointly statistically insignificant
  • Pseudo R-squared
    • constructed as 1 minus the LL ratio of our the LL of our full model - the LL of a model with no parameters and just a constant term
    • This cannot be interpreted on its own, we need different model specifications (by adding different variables if needed), then the higher the pseudo- R-squared the better the model
      • has no explanatory power on its own
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the density and Cumulative density functions of the error term in the Binary Logit Model?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Maximum Likelihood Estimation formula?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does maximum likelihood Estimation look like on a graph?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is interesting to note about the difference between the Probit and Logit model coefficient estimates?

A
  • Due to the normalisation of the variance, the Logit model’s coefficient estimate will always be a multiple of sqrt(1.6) greater than the Probit’s estimates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the partial effect of the Logit and Probit explanatory variables tell us?

A
  • The percentage change in probability of the likelihood of the binary choice outcome
  • Although the coefficient estimate between the two models are different the partial effects are nearly identical (the probability of the shaded areas doesn’t change between models)
    • e.g. probability of going to the hospital against income has the estimated partial effect of -0.09844
    • So when income increased by £1 the probability of going to see the doctor reduced by 9.844%
  • For binary choice models, the partial effect will refer to the percentage of the base category
    • Say the ‘gender’ explanatory =1 for females and =0 for males and has the partial effect of 0.14026
    • We say that females are 14.026% more likely to go to the doctors than males
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Where do you find the measure of fit on stata?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly