Research Skills Part 3 Flashcards by Rahim Mohamedhoesein

Important note on correlation

Zero correlation means that there is no linear relation between x and y. But it does not imply independence!!!

Correlation is not causation!!!

How well did you know this?

Not at all

Perfectly

Name 3 structures of correlation

x causes y
y causes x
x causes y and y causes x > self-reinforcement

How well did you know this?

Not at all

Perfectly

In a univariate regression…

Correlation determines sign of regression coefficient, and CORR^2 = R^2

How well did you know this?

Not at all

Perfectly

What is RSS?

Residual Sum of Squares = sum of all the residuals squared

How well did you know this?

Not at all

Perfectly

Give the formula for the beta coefficient

= cov(x,y) / var(x)
= (SD(y) / SD(x)) * CORR(x,y)

How well did you know this?

Not at all

Perfectly

What is TSS?

Total Sum of Squares = sum (y – y-bar)^2

How well did you know this?

Not at all

Perfectly

What is ESS

Explained Sum of Squares = sum (y-hat – y-bar)^2

How well did you know this?

Not at all

Perfectly

Give the formula for R2

TSS = ESS + RSS
R2 = 1 –RSS/TSS = ESS/TSS

How well did you know this?

Not at all

Perfectly

What are the drawbacks of R2?

It depends on how dep var is defined (changes versus levels, wages versus log wages, etc.). It is only comparable if the dep var is the same.
It always increases if you add more vars, even if they’re useless > compute Adj-R2

How well did you know this?

Not at all

Perfectly

Note on (adj) R2

(adj) R2 is useful for comparing the relative performance of 2 models with same dep var. However, it is not useful for evaluating absolute performance.

How well did you know this?

Not at all

Perfectly

Name 3 factors reducing the accuracy of OLS estimate

Large error variance (s^2) > large influence of other variables are not in the model > OMITTED VARIABLE BIAS!!!!!
Small number of observations
Little spread in indep var > without variation in x one cannot explain variation in y, but too much variation is also bad

How well did you know this?

Not at all

Perfectly

What is the F-test?

The F-test of overall significance indicates whether your linear regression model provides a better fit to the data than a model that contains no independent variables.

-Multiple regression: p-value of F-test equals p-value of null hypothesis that all coefficients are jointly equal to zero

How well did you know this?

Not at all

Perfectly

When is the Omitted Variable Bias more severe and pose a solution for this problem?

Problem more severe when the x variable in regression has high correlation with omitted variable z

Solution: multivariate regression

How well did you know this?

Not at all

Perfectly

What are the assumptions of the linear regression model?

residuals have a mean of 0 and are independent
residuals have a constant variance = homoskedasticity
residuals are uncorrelated = no autocorrelation
there’s no exact linear relation between the independent variables
.
Under these assumptions, the OLS estimators (betas) are BLUE = best linear unbiased estimator for the true beta.
Only then are the routinely computed S.E.s and t-stats correct.
.
residuals follow a normal distribution
.
If the errors are correlated with any of the independent vars, OLS is biased and inconsistent > wrong coefficient estimates!!

How well did you know this?

Not at all

Perfectly

How do you test for non-linearity and how do you fix non-linearity issues?

Test: Ramsey’s RESET test to examine linearity of regression

Solution: use data transformation > take logs or add a squared term

How well did you know this?

Not at all

Perfectly

Heteroskedasticity causes / consequences / testing / solutions

Study These Flashcards

Causes:
- changing variance over time (time series)
- changing variance across firms (cross sectional)

Consequences:
- usual S.E. and t-stats not valid
- BUT, no impact on coefficients!!

Testing:
- visual testing
- statistical tests

Solutions:
- use corrected S.E.s
- use log transform or scale variables by size

Autocorrelation causes / consequences / testing / solutions

Study These Flashcards

Causes:
- seasonality effects
- lead/lag effects > over/underreaction to news
- model misspecification

Consequences:
- usual S.E.s and t-stats not valid
- positive autocorr: S.E. understated and t-stats too big
- negative autocorr: S.E. overstated and t-stats too small

Testing:
- visual testing
- statistical tests

Solutions:
- add lagged dep/indep vars
- include dummy variable
- use corrected S.E.s

Multicollinearity causes / consequences / checking / solutions

Study These Flashcards

Causes:
- 2 or more indep vars are highly correlated

Consequences:
- low t-stats and high S.E.s for individual coefficients
- weird signs or magnitudes of coefficient estimates

Checking:
- compute CORR matrix
- compute Variance Inflation Factor. VIF > 10 is a problem

Solutions:
- drop one variable > can lead to omitted variable bias
- collect more data to increase accuracy

Non-normality causes / consequences / testing / solutions

Study These Flashcards

Causes:
- extreme observations
- bounded dep var
- binary dep var
- discrete dep var

Consequences:
- large sample > no problem
- small sample > inference about coefficients wrong and t-stats invalid
- BUT, no impact on coefficients!!

Testing:
- JB statistic to test for normal distribution

Solutions:
- winsorize / truncate
- use log transformation
- use other regression model > tobit, probit/logit

If the error term in a linear regression model is not normally distributed…

A. … the OLS estimator is biased
B. … routinely calculated S.E.s are incorrect
C. … we need to rely on asymptotic theory to perform valid tests
D. … we need to take the log of the dependent variable

Study These Flashcards

In a linear regression model, if the slope coefficient of X has a t-stat of 3.0…

A. we accept the hypothesis that X has an impact
B. we accept that X is significant
C. we reject the null hypothesis that X is insignificant
D. we reject the null hypothesis that X has no impact

Study These Flashcards

What do endogeneity and simultaneity mean?

Study These Flashcards

Endogeneity broadly refers to situations in which an explanatory variable is correlated with the error term.

Simultaneity is another common cause of endogeneity. Simultaneity arises when one or more of the predictors (e.g., treatment variable) is determined by the response variable (Y). In simple terms, X causes Y and Y causes X.

Which problem does make the OLS estimator biased?

A. simultaneity between x and y
B. heteroskedasticity
C. a small sample
D. all of these

Study These Flashcards

Which statement(s) is/are correct?

A. R2 is the most important statistic of a regression
B. R2 tells us how well the model fits the data
C. a larger R2 is always better
D. if R2=0, we have a useless model

Study These Flashcards

B & D

What increases the precision of the OLS estimator? A. having more observations B. having more variation in X C. having less correlation between X and other regressors D. having a smaller error variance

All of them

Research Skills Part 3 Flashcards

(25 cards)