Midterm Flashcards
Causal relationship
a change in one variable (action) CAUSES change in another variable (result)
Correlation
the change between X and Y can partially be explained by other factors
Error Term
- Deviation of the observed Y from the true line
- Represented by εi in structural equation
- A theoretical representation of unobserved variables that explains for the remaining change not interpreted by the model (omitted variable absorbed by error term)
Residual
- Deviation of the observed Y from the estimated line
- Calculated by e_i = Y_i – (Y_i )̂
- *Oberserved - Estimated**
R^2
Goodness of Fit
- Ranges from 0 – 1
- Closer to 1 = better fit
- Adjusted R2: Includes “penalty” for adding additional regressors
null hypothesis
The null hypothesis states “no difference” or “no effect”
alternative hypothesis
The alternative hypothesis states there is a difference/effect
T-test
- If the absolute value of the t-stat is bigger than the critical value (e.g. 1.96) it means we can reject the null hypothesis and accept the alternative that the true coefficient is not zero
- *our variable is statistically significant at the 5% level of significance
- This also means the p-value is smaller than 5% (0.05).
T-test formula
Divide the coefficient by the standard error to get the t-value
F-test
Test a set of regression coefficients for joint significance
- H0: β1 = β2 = β3= 0 (ALL coefficients = 0)
- HA: β1 ≠ 0 OR β2 ≠ 0 OR β3 ≠ 0 (at least 1 coefficient NOT equal to 0)
F-stat > Critical Value = Reject the Null
(p-value of F lower than the level of significance)
F-test formula
You want the F-stat high & probability low
Interpreting Coefficients:
Level-Level
Y = β1 X1
on average a one-unit increase in X is associated with a β1-unit increase in Y, holding all else constant
Interpreting Coefficients:
Log-Level
lnY= β1 X1
on average a one-unit increase in X is associated with a β1% increase in Y, holding all else constant
Interpreting Coefficients:
Level-Log
Y= β1 lnX1
on average a 1% increase in X is associated with a β1-unit increase in Y, holding all else constant
Interpreting Coefficients:
Log-Log
lnY= β1 lnX1
on average, a 1% increase in X is associated with a β1% increase in Y, holding all else constant
Dummy/binary variable
Only has two possible values – e.g. X = 1 if female; X= 0 is male
Y = B0 + B1female
Ex: On average, being female is associated with a B1 difference in Y compared to male, holding all else constant
Categorical Variable
A variable like “region” has multiple values (south, west, northeast, midwest) that should be transformed into individual dummy (0 or 1) variables
Y = B0 + B1south + B2west + B3 northeast
Ex: On average, living in the South is associated with a B1 change in Y compared to the Midwest, holding all else constant.
Interaction term
An independent variable in a regression equation that is the multiple of two or more other independent variables. Each interaction term has its own regression coefficient
Does the effect of work experience on salary differ between males and females?
Y = B0 +B1Experience + B2Female + B3(Experience*Female) + e
Ex: On average, a one-unit increase in experience has a B3 difference in Y for females compared to males, holding all else constant
This allows the effect of experience on income to vary by gender
B3 now measures the effect of an additional year of experience for females relative to males
7 Classical Assumptions
- Regression model is linear (in B’s), correctly specified, and has an additive error term
- The error term has a population mean of zero
- The explanatory variables are not correlated with the error term
- Observations of the error term are not correlated
- The error term has a constant variance
- The regressors are uncorrelated with each other
- Error term is normally distributed
Omitted Variable Bias
Y = β0 + β1X1 +e
where error term absorbs an omitted variable X2
Variable Inclusion Criteria
Theory: is there sound justification for including the variable?
Bias: do the coefficients for other variables change noticeably when the variable is included?
T-Test: is the variable’s estimated coefficient statistically significant?
R-square: has the R-square (adjusted R-square) improved?
First-order serial correlation
occurs when the value of the error term in one period is a function of its value in the previous period; the current error term is correlated with the previous error term.
DW Test
compare DW(d) to the critical values (𝐝_𝐋, 𝐝_𝐔)
Newey-West Standard Errors
-Designed to correct for the consequences of first-order serial correlation; they are technically still biased, but are more accurate than OLS standard errors so they can be used for t-tests and other hypothesis tests
Newey-West SE > OLS SE
-Larger standard errors produce lower t-scores, so coefficients won’t be as statistically significant
Heteroskedasticity
happens when the standard errors of a variable, monitored over a specific amount of time, are non-constant. With heteroskedasticity, the tell-tale sign upon visual inspection of the residual errors is that they will tend to fan out over time, as depicted in the image below.