L6 - Binary Choice Models Flashcards
What is the linear equation for a Binary Choice Model?
- the continuous scale of y* is not observed by the researcher
- WE only observe the binary outcome
- the decision-maker may see this continuum y*
- WE only observe the binary outcome
What does the Binary choice model look like on a graph?
- If your y* was observed we would have a linear regression model
- This graph should the relationship between the y* and the probability that y=1 –> for a specific value of x
- shaded –> Pr(y=1|X)
- unshaded –> Pr(y=0|X)
- This can be interpreted as
- Pr ( y = 1| x) = Pr(y* > 0|X) –> NOT SURE IF RIHT
- y* = α+βx+ε > 0
- Moving α+βx to the right hand side and subbing in we get
- Pr(y=1│x)=Pr(ε > -[α+βx] | x)
- The probability depends on the distribution of the error ε
How is the relationship between the linear Binary Choice model linked to the nonlinear probability model?
- Pr(y = 1 | x) = F(α + βx)
- it is actually a cumulative distribution function of your error term
- The S-shaped sigmoid curve relates to the shaded distribution functions of the linear model on the right –> for each individual value of x
- That is because y* is not observed so we only have a distribution of error terms for each level of the explanatory variable
What is one important distinction between linear regression model and non-linear model to do with the variance of the error term?
- One important distinction between linear regression models and non-linear models is that under OLS we can determine the variance of the error term
- For discrete choice models, because the models reveal less information (you do not observe the continuum of y*) you have to assume the variance of the return –> this assumption will affect the shape of the distribution
- e.g. fatter/thinner tails —> but wont effect the proportion of the shaded area
- For discrete choice models, because the models reveal less information (you do not observe the continuum of y*) you have to assume the variance of the return –> this assumption will affect the shape of the distribution
- There we assume the variance of the error terms –> will only consider two types of models (Binary Probit and Binary Logit models)
How does the normalisation of the error term affect the parameters of our Binary Choice equation?
- Same as say setting the variance of the error term
- So we have an original equation of utility
- Say we multiply this equation by a constant λ
- λ U0nj becomes U1nj
- The variance of the error term is now multiplied by λ2
- Say we multiply this equation by a constant λ
- Given that Var(ε0nj )=σ2 is not observed how can I make this equal to 1 (in the case of the Probit model?)
- divide the variance by σ2, given that σ2=λ2, then we must divide the original utility equation by σ, so that λ=(1/σ)
- This will now give us a Binary-Probit model, but when we estimate the equation we will not estimate beta, but actually beta/σ –>, therefore, cannot directly interpret the magnitude of the estimated coefficients in this model –> as σ is still unknown
- On the same line for Logit models –> if the variance of the error term is equal to roughly 1.6, we divide by σ2 and then multiple by π^2/6 = 1.6 . Therefore λ in the original equal equals –> sqrt(1.6)/σ
How do we test of well does a model fits the discrete choice model?
- LL Ratio index
- constructed by estimating the model with no parameters in it just with a constant term and extracting the LL function of that and then subtracting the LL of our full model and multiplying of -2
- If greater than the chi-squared statistics we reject the null that our parameters are jointly statistically insignificant
- constructed by estimating the model with no parameters in it just with a constant term and extracting the LL function of that and then subtracting the LL of our full model and multiplying of -2
- Pseudo R-squared
- constructed as 1 minus the LL ratio of our the LL of our full model - the LL of a model with no parameters and just a constant term
- This cannot be interpreted on its own, we need different model specifications (by adding different variables if needed), then the higher the pseudo- R-squared the better the model
- has no explanatory power on its own
What are the density and Cumulative density functions of the error term in the Binary Logit Model?
What is the Maximum Likelihood Estimation formula?
What does maximum likelihood Estimation look like on a graph?
What is interesting to note about the difference between the Probit and Logit model coefficient estimates?
- Due to the normalisation of the variance, the Logit model’s coefficient estimate will always be a multiple of sqrt(1.6) greater than the Probit’s estimates
What does the partial effect of the Logit and Probit explanatory variables tell us?
- The percentage change in probability of the likelihood of the binary choice outcome
- Although the coefficient estimate between the two models are different the partial effects are nearly identical (the probability of the shaded areas doesn’t change between models)
- e.g. probability of going to the hospital against income has the estimated partial effect of -0.09844
- So when income increased by £1 the probability of going to see the doctor reduced by 9.844%
- For binary choice models, the partial effect will refer to the percentage of the base category
- Say the ‘gender’ explanatory =1 for females and =0 for males and has the partial effect of 0.14026
- We say that females are 14.026% more likely to go to the doctors than males
Where do you find the measure of fit on stata?