Research Skills Part 3 Flashcards
Important note on correlation
Zero correlation means that there is no linear relation between x and y. But it does not imply independence!!!
Correlation is not causation!!!
Name 3 structures of correlation
- x causes y
- y causes x
- x causes y and y causes x > self-reinforcement
In a univariate regression…
Correlation determines sign of regression coefficient, and CORR^2 = R^2
What is RSS?
Residual Sum of Squares = sum of all the residuals squared
Give the formula for the beta coefficient
= cov(x,y) / var(x)
= (SD(y) / SD(x)) * CORR(x,y)
What is TSS?
Total Sum of Squares = sum (y – y-bar)^2
What is ESS
Explained Sum of Squares = sum (y-hat – y-bar)^2
Give the formula for R2
TSS = ESS + RSS
R2 = 1 –RSS/TSS = ESS/TSS
What are the drawbacks of R2?
- It depends on how dep var is defined (changes versus levels, wages versus log wages, etc.). It is only comparable if the dep var is the same.
- It always increases if you add more vars, even if they’re useless > compute Adj-R2
Note on (adj) R2
(adj) R2 is useful for comparing the relative performance of 2 models with same dep var. However, it is not useful for evaluating absolute performance.
Name 3 factors reducing the accuracy of OLS estimate
- Large error variance (s^2) > large influence of other variables are not in the model > OMITTED VARIABLE BIAS!!!!!
- Small number of observations
- Little spread in indep var > without variation in x one cannot explain variation in y, but too much variation is also bad
What is the F-test?
The F-test of overall significance indicates whether your linear regression model provides a better fit to the data than a model that contains no independent variables.
-Multiple regression: p-value of F-test equals p-value of null hypothesis that all coefficients are jointly equal to zero
When is the Omitted Variable Bias more severe and pose a solution for this problem?
Problem more severe when the x variable in regression has high correlation with omitted variable z
Solution: multivariate regression
What are the assumptions of the linear regression model?
- residuals have a mean of 0 and are independent
- residuals have a constant variance = homoskedasticity
- residuals are uncorrelated = no autocorrelation
- there’s no exact linear relation between the independent variables
.
Under these assumptions, the OLS estimators (betas) are BLUE = best linear unbiased estimator for the true beta.
Only then are the routinely computed S.E.s and t-stats correct.
. - residuals follow a normal distribution
.
If the errors are correlated with any of the independent vars, OLS is biased and inconsistent > wrong coefficient estimates!!
How do you test for non-linearity and how do you fix non-linearity issues?
Test: Ramsey’s RESET test to examine linearity of regression
Solution: use data transformation > take logs or add a squared term