week 3 part 1 Flashcards

Question 1

Q

How come that we may need to develop our dummy variables?

Answer

A

When we used the dummyvariables the predicted y differed between two categories by a fixed amount across values of x. Sometimes we need an interaction variable that allows the predicted y to differ between the two categories by a varying amount across values of x.

Question 2

Q

How can you assess the significance of the interaction variable?

Answer

A

A t-test for the individual significance of the dummy variable d and the interaction variable xd.
A partial F-test to evaluate the joint significance of d and xd

Question 3

Q

Besides models where the explanatory variables are dummy variables, What other types of Classification Models can we build?

Answer

A

Classification models where the response variable is binary (e.g., yes/no, success/failure).

Question 4

Q

What are the outcomes in a linear regression model where y is a binary variable?

Answer

A

Then y is a discrete stochastic variable with only two possible outcomes (0 or 1).

Question 5

Q

What is this linear regression model applied to a binary response variable called?

Answer

A

The linear probability model (LPM).

Question 6

Q

What are the pros and cons of LPM?

Answer

A

Pros: It is simple to estimate and interpret.
Cons: The model can predict probabilities greater than 1 or less than 0, which is not feasible.

Question 7

Q

What is a more suitable model for binary response variables than the LPM?

Answer

A

The logistic regression model.

Question 8

Q

Why is the Logistic Regression Model better than the LPM for binary response variables?

Answer

A

The logistic regression model ensures that the predicted probabilities lie between 0 and 1 for all values of the explanatory variables.

Question 9

Q

What is used to estimate logistic regression?

Answer

A

maximum likelihood estimation (MLE) instead of OLS.

Question 10

Q

Is the interpretation the same in a logistic regression as in the linear models?

Answer

A

No, the coefficients have a different interpretation.

Question 11

Q

What is sometimes used to interpret the logistic model?

Answer

A

The odds ratio

Question 12

Q

How do you calculate accuracy?

Answer

A

First, convert the predicted y(hat) values to binary predictions:
1 if y(hat) ≥0.5 and 0 if y(hat) < 0.5.
Then, compare the binary values of the response variable with the binary predictions.
The accuracy is calculated as:
Accuracy = (number of correct predictions) / (the number of predictions) * 100

Question 13

Q

What is the odds ratio?

Answer

A

The ratio between the probability of success P(y=1) and failure P(y=0), P(hat)/(1-P(hat)

Question 14

Q

What is often used to measure of goodness of fit for binary choice models?

Question 15

Q

When can accuracy be misleading?

Answer

A

In cases where there are many 0s and few 1s, or vice versa.

week 3 part 1 Flashcards

(15 cards)