Week 6 Flashcards

1
Q

How do we describe binary qualitative information?
- e.g. a person is either male or female

A

Can be captured by defining a binary variable
- e.g. 1 if female, 0 if male

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

SLR with dummy variable as regressor

A

Wage = B0 + y0female + u
- assuming SLR.4 holds: E[u|female] = 0

E[wage|female] = B0 + y0female
= b0 if female -> 0,
Or
B0 + y0 if female -> 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the coefficient of the dummy variable mean in the OLS of SLR with dummy variable?

A

Y0 = E[wage|female = 1] - E[wage|female = 0]
- the difference in average wage between women and men
- difference in average outcomes between the two groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does the choice of the base group work

A

We get the same answer if we flip the base group, so the base group in the other example was male, as its B0

  • because male = 1 - female, coefficient on the dummy changes sign, but must remain the same magnitude
  • intercept changes because now the base group is female
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens if we put female AND male both in the equation?

A

It is redundant, this is the simplest case of the dummy variable trap - example of perfect collinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Dummy variables for multiple categories
- female and married

Single male is the base group

A

Marriedmale = married(1-female)
Married female = married.female
Single male = (1-married).(1-female)
Single female = (1-married).(female)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If we have lets say 4 groups as before, how many would be in the regression?

A

Only 3, the base group would not be included here.
- the base group would be the intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Whats the point of interaction terms among dummy variables?

A

Used to model conditional effects, so the effect of one variable, depending on another, e.g. effect of being married on wages can differ based on gender.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Chow test is for what?

A

To test whether two groups have the same regression functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to compute the chow test statistic?

A
  1. Pool the data and estimate a single regression, this is the restricted model, and produces the restricted SSR, call this SSRp, the pooled SSR
  2. Split sample into the two groups, and estimate regression for each subsample, unrestricted SSR would be the added SSR of these two groups

Fchow = ([SSRp - (SSRur)]/(k+1)) / ((SSRur)/(n-2(k+1))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the linear probability model (LPM)?

A

LPM is a special case of regression analysis where the dependent variable y is binary:
- y = 1 if a young man is arrested for a crime, y = 0 if otherwise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we interpret the LPM:
Y = b0 + b1x1 + b2x2 … + bkxk + u

When y is binary?

A

E[y|x] = Pr(y=1|x) x 1 + Pr(y=0|x) x 0
= Pr(y=1|x), called the response probability

So let’s say b1 was 0.035, that means one more unit of x1 means an increase by 3.5% in the probability of y=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Reformulated linear probability model once expected value calculations are done:

A

Pr(y=1|x) = b0 + b1x1 … + bkxk

Therefore, as said earlier, b1 = the change in estimated probability of y=1 given an added unit of x, other factors held fixed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

First shortcoming of the LPM

A

1 - the fitted values from an OLS regression are never guaranteed to be between 0 and 1, yet these fitted values are estimated probabilities, i.e y can sometimes be outside the range [0,1] - invalidates the estimate but not the LPM for estimating partial effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Issue with LPM’s partial effects

A

LPM assumes partial effects are constant throughout the range of the explanatory variables, but for the estimated model to truly represent a probability, the effect of education should be diminishing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Second short coming - heteroskedasticity

A

Because y is binary, its variance is p(1-p), following Bernoulli
, as p = Pr(y=1), can sub that in

  • var(y|x) = p(x)(1-p(x))
    Since u = y-p(x) in expectation
  • var(u|x) = var(y) = p(x)(1-p(x))

Therefore the variance of u is a function of x, implying heteroskedasticity

17
Q

What happens now that LPM means MLR.5 is violated?

A

T statistics often depend on the assumption of correct SEs, so with incorrect, implies t stats not derived from N(0,1)
- cant trust the p values or CIs derived from these stats either

18
Q

Goodness of fit in LPMs

A
  • can still use R and adjusted R squared, but difficult due to y being binary

Use Percent Correctly Predicted
- let yi^ be the OLS fitted value - a probability estimate
- convert yi^ into a binary yi_, so it is 1 if yi^>/0.5 and 0 if not for example
- here, yi^ can take any real value, while yi_ is strictly binary - matching the structure of y

This assesses the classification accuracy of the LPMs

19
Q

Four possible cases in the percent correctly predicted model:

A

(Yi,yi_) = 1,1 - correct prediction
= 0,0 - correct prediction
Etc

Then compute the accuracy rate