S7 OLS Regression & Assumptions (complete) Flashcards

Question 1

Q

Correlation vs. Regression

What is the difference?

Answer

A

Correlation:
tells us how strongly associated two variables are

Regression:
can tell us, on average, how much a one unit increase in the iv increases/decreases the predicted value of the dv

-> Regression gives us more precise information on the strength of a relationship

-> bivariate regression finds the best fit for a line through the data; the line with the best fit is the one that minimizes the Y distance from each observation to the line -> to find the best line use OLS

Question 2

Q

Ordinary Least Squares

Answer

A

OLS minimizes the prediction errors in yi - [y-hat]

b= Covar(x,y) / Var(x) (-> siehe Formelsammlung)

Question 3

Q

Standard error of the slope

Why does the slope have a standard error and how is it calculated? -> Hint RMSE

Howcan we build a confidence interval around the slope?

Answer

A

coefficients (b) are also sample statistics -> random sampling error

standard error of the slope (b), is given by the root mean square error (RMSE) over the standard deviation

RMSE is given by the root of the error sum of squares (ESS) over the adjusted sample size; the RMSE is a useful measure of goodness of fit

beta = b +/- 1.96 x RSME

Question 4

Q

Hypothesis testing with regression

2 ways to do it

Answer

A

calculate degrees of freedom: df = n minus # of parameters (a&b)
Form a null hypothesis: i.e. no effect -> beta = 0, the regression line is horizontal
Evaluate: to reject the null, the confidence intervals around b should exclude zero

Alternatively: calculate a t-ratio

t= b-beta(H0) / s.e.

with beta(H0) usually zero

-> If our t-ratio is greater than 2 (i.e., 1.96), p is under .05 and we can reject the Null hypothesis

Question 5

Q

What are the two general wys to measure the performance of an estimator?

Answer

A

two general ways to measure the performance of an estimator:

> Bias:
- a systematic tendency to produce estimates that are too high or too low relative to the true value
- minimize the bias

> Efficiency
- an efficient estimator yields standard errors that are as small as possible

Question 6

Q

What are the 5 OLS assumptions?

Answer

A

Linearity: The dependent variable y is a linear function of the x’s, plus a population error term.
Mean independence: The mean value of the error does not depend on any of the x’s.
Homoscedasticity (variance dependence): The variance of the error cannot depend on the x’s. The variance is constant.
Uncorrelated disturbances: The value of the error for any observation is uncorrelated with the value of the error for any other observation.
Normal disturbance: The disturbances/errors are distributed normally.

Question 7

Q

Which OLS assumptions guarantee what?

Answer

A

Assumptions (1) Linearity & (2) Mean independence -> linear and unbiased estimated

Assumptions (3) homoscedasticity and (4) uncorrelated disturbances -> efficient model -> “best”

together: BLUE

Adding assumption (5) normality implies that a t- or z-table can be used to calculate p-values

Question 8

Q

Mean Independence

Answer

A

Most important assumption because violations

can generate LARGE bias in the estimates and often occur
cannot be tested for without additional data-> if your x’s are related to something outside of the model, they might be picking up its effect on y as well as their own!
-> this is called omitted variable bias

Question 9

Q

Dangers of Violating Mean Independence

Answer

A

Omitted variable bias
- can generate LARGE bias in the estimates and often occur
-> if your x’s are related to something outside of the model, they might be picking up its effect on y as well as their own!

cannot be tested for without additional data

Endogeneity bias
->explanatory variable is correlated with the error term

often reverse causation or selection effects
If y has a causal effect on any of the x’s, then the error term will indirectly affect the x’s

Measurement Error
- If that x’s are measured with error, that error becomes part of the error term
- Because the measurement error affects the measured value of the x’s, the error term is related to the x’s

Question 10

Q

Assumption (3) Homoscedasticity

Answer

A

Wanted: homoscadicity;
bad brother = heteroscadicity

-> Non-constant variance (scatterplot that looks like a “joint”)
-> Biased standard errors (in either direction)
- easily fixed with “robust standard errors”

Question 11

Q

(4) Uncorrelated Errors

Answer

A

The disturbances (errors) for any two observations must be uncorrelated.

Correlated errors can arise from connected observations (e.g. Husbands and Wives), causal effects (e.g. peer pressure) or serial connection (measuring same unit over time)

correlated errors do not bias coefficient estimate

But they do
- shrink the standard errors

observations are assumed to be more independent than they are
DANGER: Type 1 error!! False positive
solution depends on type of correlation in errors, e.g. “clustered standard errors”

Question 12

Q

Normality

Answer

A

The population disturbance term must be normally distributed

Note that only disturbances, not the variables, must be normally distributed (Big misconception!!)

Normality is the least important assumption because OLS can be BLUE without it (unbiased and efficient)

Normally distributed disturbances simply enable the use of a z- or t-table for the p-values. Thus, in large samples we don’t even care about normality of disturbances

Question 13

Q

Which pitfalls can bias estimates and which can influence the standard errors?(assumptions + darüber hinaus)

Answer

A

Pitfalls that can bias estimates:

(1) Non-linearity (misspecification)
(2) Violation of mean independence > omitted var bias (misspecification)

endogeneity (= explanatory variable is correlated with the error term)
measurement error

Standard errors:
- Outliers - sometimes from skew
- heteroskedasticity
- correlated errors
- multicollinearity

Question 14

Q

Consider a linear function, y = α + βx. What does the constant α signify? (Select ALL the answers that apply)

a. The value of x when the y-intercept is 0
b. The value of y when x is 0
c. The value of the residuals when x is 0
d. The Y-intercept

Answer

A

Correct: b & d

Question 15

Q

Which of these statements does not form part of the OLS assumptions?

Select one:

a. Mean independence. The mean value of ε does not depend on any of the x’s. Assume that e(ε)=0.

b. Linearity. The dependent variable y is a linear function of the x’s, plus a population error term, ε.
y = α + β1 x1 + β2 x2 + ε

c. Normality. The dependent variable is approximately normally distributed around its mean.

d. Uncorrelated disturbances. The value of ε for any observation is uncorrelated with the value of ε for any other observation.

Answer

A

Correct: C

Question 16

Q

Which of the following accurately describes a p-value? Check all that apply.

a.
For the same measured relationship, a larger sample size will lead to a smaller p-value.

b. p=0.01 means that there is a 1% chance that our alternative hypothesis is true.

c. p=0.01 means that there is a 1% chance that we would see the measured relationship due to random chance.

d. If our t-value is large, our p-value will also be large.

Answer

Study These Flashcards

A

Correct: A & C

S7 OLS Regression & Assumptions (complete) Flashcards

(16 cards)