L6 - Binary Choice Models Flashcards
What is the linear equation for a Binary Choice Model?
- the continuous scale of y* is not observed by the researcher
- WE only observe the binary outcome
- the decision-maker may see this continuum y*
- WE only observe the binary outcome
![](https://s3.amazonaws.com/brainscape-prod/system/cm/392/391/775/a_image_thumb.png?1645198111)
What does the Binary choice model look like on a graph?
- If your y* was observed we would have a linear regression model
- This graph should the relationship between the y* and the probability that y=1 –> for a specific value of x
- shaded –> Pr(y=1|X)
- unshaded –> Pr(y=0|X)
- This can be interpreted as
- Pr ( y = 1| x) = Pr(y* > 0|X) –> NOT SURE IF RIHT
- y* = α+βx+ε > 0
- Moving α+βx to the right hand side and subbing in we get
- Pr(y=1│x)=Pr(ε > -[α+βx] | x)
- The probability depends on the distribution of the error ε
![](https://s3.amazonaws.com/brainscape-prod/system/cm/392/391/793/a_image_thumb.png?1645198284)
How is the relationship between the linear Binary Choice model linked to the nonlinear probability model?
- Pr(y = 1 | x) = F(α + βx)
- it is actually a cumulative distribution function of your error term
- The S-shaped sigmoid curve relates to the shaded distribution functions of the linear model on the right –> for each individual value of x
- That is because y* is not observed so we only have a distribution of error terms for each level of the explanatory variable
![](https://s3.amazonaws.com/brainscape-prod/system/cm/392/391/804/a_image_thumb.png?1645198722)
What is one important distinction between linear regression model and non-linear model to do with the variance of the error term?
- One important distinction between linear regression models and non-linear models is that under OLS we can determine the variance of the error term
- For discrete choice models, because the models reveal less information (you do not observe the continuum of y*) you have to assume the variance of the return –> this assumption will affect the shape of the distribution
- e.g. fatter/thinner tails —> but wont effect the proportion of the shaded area
- For discrete choice models, because the models reveal less information (you do not observe the continuum of y*) you have to assume the variance of the return –> this assumption will affect the shape of the distribution
- There we assume the variance of the error terms –> will only consider two types of models (Binary Probit and Binary Logit models)
![](https://s3.amazonaws.com/brainscape-prod/system/cm/392/391/819/a_image_thumb.png?1645199331)
How does the normalisation of the error term affect the parameters of our Binary Choice equation?
- Same as say setting the variance of the error term
- So we have an original equation of utility
- Say we multiply this equation by a constant λ
- λ U0nj becomes U1nj
- The variance of the error term is now multiplied by λ2
- Say we multiply this equation by a constant λ
- Given that Var(ε0nj )=σ2 is not observed how can I make this equal to 1 (in the case of the Probit model?)
- divide the variance by σ2, given that σ2=λ2, then we must divide the original utility equation by σ, so that λ=(1/σ)
- This will now give us a Binary-Probit model, but when we estimate the equation we will not estimate beta, but actually beta/σ –>, therefore, cannot directly interpret the magnitude of the estimated coefficients in this model –> as σ is still unknown
- On the same line for Logit models –> if the variance of the error term is equal to roughly 1.6, we divide by σ2 and then multiple by π^2/6 = 1.6 . Therefore λ in the original equal equals –> sqrt(1.6)/σ
![](https://s3.amazonaws.com/brainscape-prod/system/cm/392/391/830/a_image_thumb.png?1645199681)
How do we test of well does a model fits the discrete choice model?
- LL Ratio index
- constructed by estimating the model with no parameters in it just with a constant term and extracting the LL function of that and then subtracting the LL of our full model and multiplying of -2
- If greater than the chi-squared statistics we reject the null that our parameters are jointly statistically insignificant
- constructed by estimating the model with no parameters in it just with a constant term and extracting the LL function of that and then subtracting the LL of our full model and multiplying of -2
- Pseudo R-squared
- constructed as 1 minus the LL ratio of our the LL of our full model - the LL of a model with no parameters and just a constant term
- This cannot be interpreted on its own, we need different model specifications (by adding different variables if needed), then the higher the pseudo- R-squared the better the model
- has no explanatory power on its own
![](https://s3.amazonaws.com/brainscape-prod/system/cm/392/391/841/a_image_thumb.png?1645201054)
What are the density and Cumulative density functions of the error term in the Binary Logit Model?
![](https://s3.amazonaws.com/brainscape-prod/system/cm/392/391/856/a_image_thumb.png?1645201527)
What is the Maximum Likelihood Estimation formula?
![](https://s3.amazonaws.com/brainscape-prod/system/cm/392/391/872/a_image_thumb.png?1645201649)
What does maximum likelihood Estimation look like on a graph?
![](https://s3.amazonaws.com/brainscape-prod/system/cm/392/391/883/a_image_thumb.png?1645201699)
What is interesting to note about the difference between the Probit and Logit model coefficient estimates?
- Due to the normalisation of the variance, the Logit model’s coefficient estimate will always be a multiple of sqrt(1.6) greater than the Probit’s estimates
What does the partial effect of the Logit and Probit explanatory variables tell us?
- The percentage change in probability of the likelihood of the binary choice outcome
- Although the coefficient estimate between the two models are different the partial effects are nearly identical (the probability of the shaded areas doesn’t change between models)
- e.g. probability of going to the hospital against income has the estimated partial effect of -0.09844
- So when income increased by £1 the probability of going to see the doctor reduced by 9.844%
- For binary choice models, the partial effect will refer to the percentage of the base category
- Say the ‘gender’ explanatory =1 for females and =0 for males and has the partial effect of 0.14026
- We say that females are 14.026% more likely to go to the doctors than males
Where do you find the measure of fit on stata?
![](https://s3.amazonaws.com/brainscape-prod/system/cm/392/391/906/a_image_thumb.png?1645203472)