Final COPY Flashcards
What is the stochastic error term?
A term added to the regression equation to account for any variation in Y that is not explained by X
What does TSS equal?
ESS+SSR
What is the SE?
Sq.Rt. of RSS/{(n-k-1) ∑(Xi-Xbar)2}
What are the four assumptions of the error term?
- The variation does not change as X changes 2. Its distribution is normal 3. Conditional mean=0 4. Independent for any two observations (i,j)
What can cause stochastic error?
- OVB 2. Measurement error 3. A misspecified function 4. Random occurrences
When are dummy variables useful?
When we want to quantify something that is inherently qualitative (race, gender, etc.)
What is Ŷ?
An estimated value of Y calculated from the regression at the i-th observation
What is a residual?
The difference between the estimated and actual values of the dependent variable
What changes between observations?
The values of Y, Xs, and error terms (but not the coefficients)
While adding a variable may not change TSS…
It will likely reduce SSR and, thus likely increase R-squared
OLS seeks to minimize….
The sum of squared residuals (or SSE)
K = ?
The # of independent variables
Why is a high degree of freedom desired?
It is likely that the errors will balance out
Yi - Ŷ is….
The residual (prediction mistake)
What are the three properties of estimators?
- Unbiasedness: The estimator is correct (on avg.) 2. Consistency: As observations increase, so does the probability that the estimator is close to the pop. parameter Efficiency: Estimator has smaller relative variance (converges to the pop. parameter more quickly)
How do we adjust for degrees of freedom?
Divide by (n-1)
What does ß0 equal? (Univariate)
Ybar - ß1(Xbar)
What does ß1 equal? (Univariate)
∑(Yi-Ybar)(Xi-Xbar)/∑(Xi-Xbar)2
What is the formula for sample variance?
1/(n-1) ∑(Xi-Xbar)2
What is the formula for sample covariance?
1/(n-1) ∑(Xi-Xbar)(Yi-Ybar)
What are the 7 OLS assumptions?
- The population regression function (DGP) is linear in parameters 2. Observations are randomly drawn from the population and i.i.d 3. X[vector] is fixed in repeated samples (no measurement error) 4. The error term has a conditional mean of 0 5. Homoskedasticity 6. Errors are independent (for every i, j) 7. Outliers are unlikely
If you specify a dummy variable for each possible outcome…..
You will induce perfect multicollinearity (nothing to compare dummy to)
If the omitted variable is correlated with a regressor and it has an effect on the dependent variable….
We have OVB
If the effect of the OV on Y and the correlation between OV and regressor are moving in the same direction…
Your estimator is too big
How do you know if you have OVB?
- Use economic theory/knowledge of the subject 2. Run a robustness check
What is ESS?
Sum of the squared differences between predicted values and the mean
What is TSS?
Sum of the squared differences between actual values and the mean
What is R2?
The % of variation in Y explained by the model
What are the formulas for R2?
- ESS/TSS 2. 1 - SSR/TSS
What is the formula for the adj. R2?
- 1 - [(n-1)/(n-k-1)] x SSR/TSS
What is RMSE?
- Another goodness of fit measurement 2. Sqrt of SSR/(n-k-1)
What happens if we have perfect multicollinearity?
OLS is impossible
Why is perfect (or high) multicollinearity an issue?
A high degree of multicolinearity may be problematic because it inflates the variance of the estimator
What can we use to quantify the severity of multicollinearity in our model?
We use the variance inflation factor (VIF)
What is the VIF(ß1 hat?)
1/(1-R2)
Formula for t-test?
(ß1 hat)-(H0: ß1) / SE(ß1 hat) p = 2(cdf)(-l t l)
What is the extensive formula for R2?
(ß1) ∑(Xi-Xbar)(Yi-Ybar)/∑(Yi-Ybar)2
What is the extensive formula for ESS?
(ß1)2 ∑(Xi-Xbar)2
How do you standardize a normal distribution?
Subtract the mean and divide by sigma
For a hypothesis test, what is the significance level and confidence level?
Significance = p Confidence = 1-p
What must “t” be greater than equal to for: 1. 90% confidence 2. 95% confidence 3. 99% confidence
1) 1.645 2) 1.96 3) 2.58 ^^ For two-tailed tests
If our causal effect depends on the level of another independent variable, what do we do?
Take the natural log of the variable
If our causal effect depends on another variable (but not the level) what do we do?
Use an interaction terms
What assumption do we make about the causal effect if we use a natural log?
That it is always positive or negative
What are the advantages of using a non-linear specification other than a log?
No assumptions about direction, allows for inflection points, and increasing/decreasing rates
What property about our OLS estimator is violated if we have homoskedasticity?
Efficiency
What type of GLS do we use when the form of heteroskedasticity is unknown?
Feasible GLS
How do we run feasible GLS?
- Estimate w/ OLS and calculate residuals 2. Run OLS w/ squared residuals on variance 3. Use predicted values from that to create weight (1/sq.rt(predicted values))
What is iteratively re-weighted least squares?
Feasible GLS repeated until weights converge to a value
Which hypothesis test do we use for the following situations?: 1. Single parameter (one restriction) 2. Multiple parameter (linear combination) 3. Multiple parameter (non-linear combination) 4. Multiple parameter (multiple restrictions)
- t-test (of one variable) 2. lincom (or t-test comparing two variables) 3. t-test/Taylor approximation (if one restriction) 4. F-test
What is the formula for an F-test?
(SSRr - SSRu/r) ÷ (SSRu/n-k-1)
What do you need to look at after running an F-test to determine if it is statistically significant?
The Chi-square critical values
What is the advantage of BIC over AIC?
BIC gives you consistent estimates
What should you do if you are asked about an effect? A change?
Effect = derivative Change = difference
What do you do if asked which level of x has a max effect on y?
Take the derivative and solve for x
What do you do if asked about the effect of x on y for a person w/ z years of x?
Plug z into x and solve **(Final answer * Ɛ)
What do you do if asked about the difference in y due to a difference in x?
Plug in given values and take the difference
What do you do if asked about the difference in the effect of x on y? (Someone w/ 10 years vs. someone w/ 20)
Take the derivative, plug in the numbers and take the difference