Midterm Flashcards
u
the variation in yi that is not captured or explained by xi → this could include both unsystematic predictors of yi (e.g., a job application randomly landing on the top or bottom of a stack of other applications) and systematic determinants of yi (e.g., years of prior experience) that are omitted from the model
PRF vs. SRF
PRF: E(yi) =β0+β1xi
SRF: yˆi = βˆ0 + βˆ1xi
Errors and residuals for SRF
Notice that there is no estimate of uˆi for the SRF because yˆi, by definition, is the regression line (same logic applies to the PRF)
The estimates of the errors, which are called the residuals, are the differences between observed values of yi and the predicted values yˆi :
uˆ i = y i − yˆ i = y i − βˆ 0 − βˆ 1 x i
OLS
OLS is the most commonly used estimator in the social sciences (to find beta)
OLS will be our workhorse estimator in this course
OLS obtains estimates of the “true” population parameters β0 and β1, which we typically do not observe
The logic of the OLS estimation procedure: choose βˆ0 and βˆ1 that minimize the sum of squared residuals: uˆ2
Why minimize the sum of squared residuals, instead of the sum of residuals or the absolute value of residuals?
If we use the sum of residuals, then residuals with different signs but similar magnitudes will cancel each other out
Minimizing sum of absolute values is a viable alternative but does not generate formulas for the resulting estimators
Relationship between PRF and SRF through residuals
y i = yˆ i + uˆ i
SST
Total Sum of Squares (SST): Measure of sample variation in y
SSE
Explained Sum of Squares (SSE): Measure of the part of y
explained by x
SSR
Residual Sum of Squares (SSR): Part of the variation in y unexplained by x
R^2 and magnitude of relationship between y and x
As a measure of correlation, R2 should not be confused with the magnitude of the relationship between a DV and IV
You can have a bivariate relationship that has a high R2 (i.e., high correlation), but that has a slope that is close to 0
You can also have a bivariate relationship with a low R2 (i.e., low correlation), but that has a slope that is high in magnitude
What changes when you transform a regressor?
Bottom line: if we transform a regressor, then only the slope coefficient for that regressor is transformed
What happens if the relationship between wage and education is non-linear?
These patterns can be nicely modeled by re-defining the dependent and/or independent variables as natural logarithms
The linear regression model must be linear in the parameters, but not necessarily linear in the variables, so logging Y or X shouldn’t violate our requirement of a linear relationships between the dependent variables and its determinants
Linear regression models?
y = β0 + β1x + u log(y) = β0 + β1x + u log(y)= β0+β1log(x)+u y = log(β0+β1x+u) e^y = β0+β1 √x+u y = β0+ (β1x1)/(1 + β2x2) + u
y = β0 + β1x + u Yes
log(y) = β0 + β1x + u Yes
log(y)=β0+β1log(x)+u Yes
y =log(β0+β1x+u) Yes
e^y =β0+β1 √x+u Yes
y=β0+ (β1x1)/(1 + β2x2) + u No
1 If we exponentiate both sides of this equation, we get: e^y = β0 + β1x + u, which is linear in the parameters
wage= β0 + β1educ + u “ “
1 additional year of education is associated with an increase in wages
of β1 units
wage= β0 + β1log(educ) + u “ “
1% increase in education is associated with an increase in wages of β1/100 units
decreasing returns
log(wage)= β0 + β1educ + u
1 additional year of education is associated with a (100 ∗ β1 )% increase in wages
increasing returns
log(wage)= β0 + β1log(educ) + u
1% increase in education is associated with a β1% increase in wages
Assumption 1
Linearity in the parameters
the population model can be non-linear in the variables but must be linear in the parameters
Assumption 2
Random Sampling
individual observations are identically and independently distributed (i.e., observations are randomly selected from a population such that each observation has the same probability of being selected, independent of which other observations were selected.)
Assumption 3
Sample Variation in the explanatory Variable
the sample standard deviation in xi must be greater than 0 (need some variance in order to get an estimate)
Assumption 4
Zero Conditional Mean
E(u | X) = 0
if it holds, then the error term u is uncorrelated with the regressor X
this assumption is usually the biggest area of concern in empirical analysis
Assumption 5
Homokedasticiy assumption
Var (u | X) = sigma^2
the variance of the unobservable error term, conditional on x, is assumed to be constant
Var(u) is independent of x
If assumptions 1-4 hold….
the OLS estimator is unbiased, meaning that on average, E(beta-hat) = beta
If assumptions 1-5 hold
if 1 - 5 hold, then we can derive a formula for the variance of the coefficient estimates, Var(beta-1)
What drives the variance of the OLS slope estimate? What makes it more precise?
the lower the variation in the errors
or the greater the var in the indept variable
or the greater the same size (relatedly b/c sample variation increases with sample size)
then the more precise the OLS estimates, on average
Standard errors measure… relationship with percision
precision or efficiency of the estimate beta1-hat
se-hat (beta1-hat) is lower (i.e. more precise) when:
the residuals are small
the variation of the independent variable is large
the number of observations is large
Motivation for multiple regression analysis
1) controlling for other factors:
even if you are primarily interested in estimating one parameter, including others in the regression will control of potentially confounding factors, the zero conditional mean assumption is more likely to hold
2) better predictions:
more independent variable can explain more of the variation in y, meaning potentially higher R^2
3) Estimating non-linear relationships:
by including higher order terms of a variable, we can allow for a more flexible, non-linear functional form between the dependent variable and an independent variable of interest
4) Testing joint hypotheses on parameters:
can test whether multiple independent variables are jointly statistically significant
How does OLS make estimates?
minimizes the sum of the squared residuals, combo of all the betas that gives the lowest sum of squared residuals
How do you isolate the variation that is unique to x3?
regress x3 on all the other regressors and obtain the residuals, e would contain the variation in x3 not explained by the other regressors from the initial population model, effectively holding all else constant
then we conduct a bivariant regression of y on e-hat
betak-hat represents what “ “
in terms of slope?
the partial association between y and xk holding x1, x2,…, xk-1 equal
beta-k–hat would be the slope of the multi deminsion plane along the x-k direction (ie the expected change in y when x1 increases by one unit, holding all other x’s constant
What does R^2 NOT tell you
high R^2 doesn’t mean that nay of the regressors are a true cause of the dependent variable
also does not meant that any of the coefficients are unbiased
Estimating non-linear relationships
by including higher order terms of an independent variable we can allow for a non-linear, or “more flexible functional form” between the dependent variable and an explanatory factor
include x^2 as a regressor, take the partial derivative, gives you the total effect of x in two parts (linear and non-linear)
if the first and second terms are substantively and significantly different from 0, then we have a situation where the sign and magnitude of the effect on wages can vary as x changes = non-linear relationship
“marginal return to x”, no ceteris paribus interpretation of individual parameters here, we must choose a given level of x and then describe the trade off
“for an individual with ten years of experience, accumulating an additional year of experience is expected to increase his/her hourly wage by $0.18
When can you interpret coefficients as causal?
when assumptions 1-4 hold, but usually ZCM fails, but it is more likely to hold with multiple regression
R-j^2, what is it? What does a high value mean?
it is the R^2 from regressing x-j on all of the other independent variables
a high R-j^2 is often the result of multicollinearity, high but not perfect correlation between two regressors
lead to imprecise estimates
When adding x3 to the regression model, what happens to var-hat( beta-1-hat)?
two countervailing channels:
will will almost certainly reduce sigma-hat^2 (squared residuals), which will make the estimate more precise, this reduction will depend on the extent to which x3 predicts y
by adding x3 we also introduce some correlation and perhaps multicollinearity between x1 and x3, or/and x2 and x3, which works against a more precise estimate of var-hat( beta-1-hat), this will depend on how correlated the regressors are
Gauss Markov Thm holds when?
Assumptions 1-5 hold, this means that the beta-hats are the best linear unbiased estimators (BLUE)