Part 2 : Dummy Variables Flashcards
What are dummy variables
Variables to encode qualitative info
(So far we have only considered quantitive e.g wages, savings etc)
Suppose we have a model like this
ln(wagei) = β₀ + β₁ Educi + δ₀ Femalei + εi
assume females =1 (so male=0)
δ₀ is parameter for the dummy variable (same concept as β)
Find expected value of log wage of individuals for
A) female
B) male
C) so what is the difference in expected wage between gender?
A) For femalei=1
Log wage = (β₀+δ₀) +β₁Educi
B) For femalei=0
Log wage = β₀+β₁Educi
C)
Subtract one from the other = δ₀
δ₀ is the difference in expected wage between men and women at a given education.
Why cant we include indicators for both men and women e.g ln(wagei) = β₀ + β₁ Educi + δ₀ Femalei + δ’₀ Male + εi
Dummy variable trap - Mutually exclusive (cannot occur at same time) , Exhaustive, so perfect collinearity
We can’t have both female=1 and male=1
, female+male must=1
What if qualititative variables have multiple catagories: what is the general rule of thumb
For a qualitative variable with m categories, we need to include m-1 dummy variables
E.g gender has 2 categories , so m-1 =1 hence why we only used one variable !
So gender is a qualitiative variable with 2 categories.
Example of qualitative variables with multiple categories:
Different sectors of the economy in which indiviudal i works in can be manufacturing, services or agriculture.
So m=3. Using rule of thumb, how can we create a model?
Firstly, rule of thumb m-1 means we need 2 dummy variables. So…
Manufi=1 if i is in manufacturing, 0 otherwise
Servi= 1 if i is in services, 0 otherwise
This leaves agriculture as the basline category.
Then estimate ln(wagei) = β₀+β₁Educi + δ₀Manufi +δ₁Servi + εi
Where δ₀ and δ₀ capure wage gap between manu and services, relative to agriculture (since the baseline!)
Assumptions: What must the m-1 dummy variables must be (2)
Mutually exclusive (cannot occur at same time)
Exhaustive
Recall previous model
ln(wagei) = β₀ + β₁ Educi + δ₀ Femalei + εi
What if returns to education differ by gender?
(The 2 explanatory variables interact!?)
include a multiplicative dummy variable
How can we capture this interaction via a multiplicative dummy variable?
(What would our expression become?)
B) so normal dummy variables as previous just cause a change in intercepts (e.g higher intercept = higher starting wage for men than women as we found δ₀ was negative).
What do multiplicative dummy variables change?
ln(wagei) = β₀ + β₁Educi + δ₀Femalei + δ₁ Femalei × Educi + εi
B)
multiplicative variables cause changes in gradients AND intercepts.
So now we have model
ln(wagei) = β₀ + β₁Educi + δ₀Femalei + δ₁ Femalei × Educi + εi
Interpretate β₀ and β₁ now (find log wages for females and males separetely again!)
For Femalei = 1
Log wage= (β₀ + δ₀) + (β₁ + δ₁) Educi
For Femalei =
Log wage = β0 + β1 Educi
Subtracting one from the other leaves us δ₀ and δ₁
So difference in intercepts when δ0≉ 0
But there is also a difference in gradients when δ1≉ 0 (the returns between different genders on given education differs!)
Visualising multiplicative dummy variables: 4 scenarios
A) if δ₀=0 and δ₁=0
B) if δ₀≠0 and δ₁=0
C) if δ₀=0 and δ≉0
D) if δ₀≉0 and δ≉0
If both deltas = 0. It means men and women have the same line. (Same slope and intercept)
If δ₀≉0 and δ₁=0 means parallel regressions (same gradient just different intercepts)
If δ₀=0 and δ≉0 means same intercept, concurrent (different) slope. So one has a high return to education
If δ₀≉0 and δ≉0, means dissimilar regressions. Different intercepts and slopes.
Just as we can multiply a dummy variable with a quantitative variable (gender x education), we can also interact two dummy variables!
Suppose we suspect a differential gender wage gap in services (2 dummys are services and gender, obviously)
ln(wagei) = β₀+β₁Educi + δ₀Femalei + δ₁Servi + εi
What does our model become?
ln(wagei) =
β₀ + β₁Educi + δ₀Femalei + δ₁Servi + δ₂Femalei × Servi + εi
So gender wage gap within service sector.
δ2 Femalei × Servi is the interactive dummy variable
So how to interpret δ₀,δ₁ and δ₂?
ln(wagei) =
β₀ + β₁Educi + δ₀Femalei + δ₁Servi + δ₂Femalei × Servi + εi
Find log wages of males and females in service/non service!
Male in non-service (femalei=0 servicei=0) (simplest one)
Log wage = β₀ + β₁ educi
Female in non service (femalei=1 servicei=0) (2nd simplest)
Log wage = (β₀+ δ₀) + β₁ educi
Male in service (femalei=0 servicei=1
(β₀ + δ₁) + β₁ educi
Female in service (femalei=1 servicei=1)
(β₀+δ₀+δ₁+δ₂) + β₁educi
Male in non-service (femalei=0 servicei=0) (simplest one)
Log wage = β₀ + β₁ educi
Female in non service (femalei=1 servicei=0) (2nd simplest)
Log wage = (β₀+ δ₀) + β₁ educi
Male in service (femalei=0 servicei=1
(β₀ + δ₁) + β₁ educi
Female in service (femalei=1 servicei=1)
(β₀+δ₀+δ₁+δ₂) + β₁educi
What is the difference between women and men in non-service?
SUBTRACT ONE FROM OTHER
=δ₀
What is the difference between men in non-service and men in service
Expected value of men in servive minus men in not service
=δ₁
What is the difference between the difference in service and non-service for women, and difference in service and non-service for men?
[Expected value of female in service - not in service] which is (δ₁+δ₂) - [expected value of men in service- not in service] which is δ₁
So (δ₁+δ₂) - δ₁
=δ₂