Week 6 Flashcards

Question 1

Q

How do we describe binary qualitative information?
- e.g. a person is either male or female

Answer

A

Can be captured by defining a binary variable
- e.g. 1 if female, 0 if male

Question 2

Q

SLR with dummy variable as regressor

Answer

A

Wage = B0 + y0female + u
- assuming SLR.4 holds: E[u|female] = 0

E[wage|female] = B0 + y0female
= b0 if female -> 0,
Or
B0 + y0 if female -> 1

Question 3

Q

What does the coefficient of the dummy variable mean in the OLS of SLR with dummy variable?

Answer

A

Y0 = E[wage|female = 1] - E[wage|female = 0]
- the difference in average wage between women and men
- difference in average outcomes between the two groups

Question 4

Q

How does the choice of the base group work

Answer

A

We get the same answer if we flip the base group, so the base group in the other example was male, as its B0

because male = 1 - female, coefficient on the dummy changes sign, but must remain the same magnitude
intercept changes because now the base group is female

Question 5

Q

What happens if we put female AND male both in the equation?

Answer

A

It is redundant, this is the simplest case of the dummy variable trap - example of perfect collinearity.

Question 6

Q

Dummy variables for multiple categories
- female and married

Single male is the base group

Answer

A

Marriedmale = married(1-female)
Married female = married.female
Single male = (1-married).(1-female)
Single female = (1-married).(female)

Question 7

Q

If we have lets say 4 groups as before, how many would be in the regression?

Answer

A

Only 3, the base group would not be included here.
- the base group would be the intercept

Question 8

Q

Whats the point of interaction terms among dummy variables?

Answer

A

Used to model conditional effects, so the effect of one variable, depending on another, e.g. effect of being married on wages can differ based on gender.

Question 9

Q

Chow test is for what?

Answer

A

To test whether two groups have the same regression functions

Question 10

Q

How to compute the chow test statistic?

Answer

A

Pool the data and estimate a single regression, this is the restricted model, and produces the restricted SSR, call this SSRp, the pooled SSR
Split sample into the two groups, and estimate regression for each subsample, unrestricted SSR would be the added SSR of these two groups

Fchow = ([SSRp - (SSRur)]/(k+1)) / ((SSRur)/(n-2(k+1))

Question 11

Q

What is the linear probability model (LPM)?

Answer

A

LPM is a special case of regression analysis where the dependent variable y is binary:
- y = 1 if a young man is arrested for a crime, y = 0 if otherwise.

Question 12

Q

How do we interpret the LPM:
Y = b0 + b1x1 + b2x2 … + bkxk + u

When y is binary?

Answer

A

E[y|x] = Pr(y=1|x) x 1 + Pr(y=0|x) x 0
= Pr(y=1|x), called the response probability

So let’s say b1 was 0.035, that means one more unit of x1 means an increase by 3.5% in the probability of y=1

Question 13

Q

Reformulated linear probability model once expected value calculations are done:

Answer

A

Pr(y=1|x) = b0 + b1x1 … + bkxk

Therefore, as said earlier, b1 = the change in estimated probability of y=1 given an added unit of x, other factors held fixed

Question 14

Q

First shortcoming of the LPM

Answer

A

1 - the fitted values from an OLS regression are never guaranteed to be between 0 and 1, yet these fitted values are estimated probabilities, i.e y can sometimes be outside the range [0,1] - invalidates the estimate but not the LPM for estimating partial effects

Question 15

Q

Issue with LPM’s partial effects

Answer

A

LPM assumes partial effects are constant throughout the range of the explanatory variables, but for the estimated model to truly represent a probability, the effect of education should be diminishing.

Question 16

Q

Second short coming - heteroskedasticity

Answer

A

Because y is binary, its variance is p(1-p), following Bernoulli
, as p = Pr(y=1), can sub that in

var(y|x) = p(x)(1-p(x))
Since u = y-p(x) in expectation
var(u|x) = var(y) = p(x)(1-p(x))

Therefore the variance of u is a function of x, implying heteroskedasticity

Question 17

Q

What happens now that LPM means MLR.5 is violated?

Answer

A

T statistics often depend on the assumption of correct SEs, so with incorrect, implies t stats not derived from N(0,1)
- cant trust the p values or CIs derived from these stats either

Question 18

Q

Goodness of fit in LPMs

Answer

A

can still use R and adjusted R squared, but difficult due to y being binary

Use Percent Correctly Predicted
- let yi^ be the OLS fitted value - a probability estimate
- convert yi^ into a binary yi_, so it is 1 if yi^>/0.5 and 0 if not for example
- here, yi^ can take any real value, while yi_ is strictly binary - matching the structure of y

This assesses the classification accuracy of the LPMs

Question 19

Q

Four possible cases in the percent correctly predicted model:

Answer

A

(Yi,yi_) = 1,1 - correct prediction
= 0,0 - correct prediction
Etc

Then compute the accuracy rate