week 3 part 1 Flashcards
How come that we may need to develop our dummy variables?
When we used the dummyvariables the predicted y differed between two categories by a fixed amount across values of x. Sometimes we need an interaction variable that allows the predicted y to differ between the two categories by a varying amount across values of x.
How can you assess the significance of the interaction variable?
- A t-test for the individual significance of the dummy variable d and the interaction variable xd.
- A partial F-test to evaluate the joint significance of d and xd
Besides models where the explanatory variables are dummy variables, What other types of Classification Models can we build?
Classification models where the response variable is binary (e.g., yes/no, success/failure).
What are the outcomes in a linear regression model where y is a binary variable?
Then y is a discrete stochastic variable with only two possible outcomes (0 or 1).
What is this linear regression model applied to a binary response variable called?
The linear probability model (LPM).
What are the pros and cons of LPM?
- Pros: It is simple to estimate and interpret.
- Cons: The model can predict probabilities greater than 1 or less than 0, which is not feasible.
What is a more suitable model for binary response variables than the LPM?
The logistic regression model.
Why is the Logistic Regression Model better than the LPM for binary response variables?
The logistic regression model ensures that the predicted probabilities lie between 0 and 1 for all values of the explanatory variables.
What is used to estimate logistic regression?
maximum likelihood estimation (MLE) instead of OLS.
Is the interpretation the same in a logistic regression as in the linear models?
No, the coefficients have a different interpretation.
What is sometimes used to interpret the logistic model?
The odds ratio
How do you calculate accuracy?
- First, convert the predicted y(hat) values to binary predictions:
1 if y(hat) ≥0.5 and 0 if y(hat) < 0.5. - Then, compare the binary values of the response variable with the binary predictions.
- The accuracy is calculated as:
Accuracy = (number of correct predictions) / (the number of predictions) * 100
What is the odds ratio?
The ratio between the probability of success P(y=1) and failure P(y=0), P(hat)/(1-P(hat)
What is often used to measure of goodness of fit for binary choice models?
Accuracy
When can accuracy be misleading?
In cases where there are many 0s and few 1s, or vice versa.