More Regression Flashcards
What is required to make predictions using a regression model?
Gauss-Markov assumptions should be met
What is a dummy variable?
A variable which summarises a category with discrete options e.g. D could be a dummy variable for sex with male corresponding to D = 0 and female to D = 1, in which case
How could regression be used for a model where there is a mean male wage and a mean female wage?
If wm = αm + εm and likewise for females, a dummy variable set to 0 for males would make it so that w = Dαf + (1 - D)αm + Dεf + (1 - D)εm = αm + D(αf - αm) + εm + D(εf - εm) which can be written α + Dβ + u with the intercept being the male wage and the slope being the difference in means
What is functional form?
The method to go beyond a linear model so that logs can be used with CRM
What is given by the slope in the relationships Y = α + βX + ε, ln(Y) = α + βX + ε, and ln(Y) = α + ln(β)X + ε?
The (constant) average change to Y from each unit of X
The % change in Y associated with each additional unit of X regardless of the initial level (xi + 1 –> yi + 100β) based on the approximation for a small change in a log, called the semi-elasticity of Y wrt X
The constant % change to Y from a 1% increase in x, called the constant elasticity of Y wrt X (so β = 1 means 1% increase in X increases Y by 1%)
When is a log transformation used?
When it turns a non-linear relationship to a linear relationship
How can the total sum of squares of a regression model be broken down?
Σ(Y - Ȳ)2 = Σ(Y - Ŷ)2 + Σ(Ŷ - Ȳ)2
Total sum of squares TSS = Error sum of squares ESS + Regression sum of squares RSS
To prove this, use Y = a + bX + e and Ȳ = a + bX̄ to get an expression for TSS which will include Σ2b(X - X̄)e which is assumed to equal zero for OLS
Using b(X - X̄) = Ŷ - Ȳ leaves the required relation
What is R2?
R2 = RSS/TSS = 1 - ESS/TSS gives the % variation in Y that is explained by the model
What is multiple regression?
When there is more than one random variable on the RHS of the regression (multiple regressors)
A systematic difference in residuals of two groups after bivariate regression would suggest there should be another regressor
What is the non-parametric method for predicting Y conditional on X1 and X2?
Calculate a mean wage for every possible paired value of Xs and use that as the CEF E(Y|X1, X2) for distinct X pairs
What is the parametric method for multiple regression?
Use the OLS condition to get FOCs for Ŷ = a + bX1 + cX2
∂SSR/∂a = -2Σe = 0
∂SSR/∂b = -2ΣeX1 = 0
∂SSR/∂c = -2ΣeX2 = 0
Substituting e = Y - Ŷ gives a = Ȳ - bX̄1 - cX̄2
If using a dummy variable for X2 where 1 is true, what can be said about the CEF?
E(Y|X1, X2 = 0) = a + bX1 and E(Y|X1, X2 = 1) = a + bX1 + c so c is the difference between the latter and the former (the intercept of the CEF shifts but the slope doesn’t change)
What is the Frisch Waugh Lovell Theorem?
A method used to estimate residuals when trying to estimate a model with two regressors but only interested in one
Regress Y on X1 to leave residuals eY and regress X2 on X1 to get residuals ex, then regress Y residuals on X residuals and the coefficient on the latter will be the same as c from the multiple regression
How does the Frisch Waugh Lovell Theorem work?
Regressing Y on X1 and X2 on X1 partials out the affects of other independent variables (X1), since residuals in a linear regression represent the left-over variation in the dependent variable after accounting for one independent variable