M9 - Simple Linear regression Flashcards
Population vs sample
What do we measure?
Why?
We measure the info that is contained in the data.
Measure only x and y
U is not observable
–> if it would be, we could determine the coeff exactly
How do we measure b0 and b1?
We only have estimated values of b0 and b1
–> determine the estimators
Whats OLS?
Ordinary Least Squares
–> choosing both coeff in a way that the sum of the squared residuals is minimized.
Estimator of the slope coeff
Only correct if….
b1
Only correct if denominator is +
OLS summary
- the … of the slope coeff …. = …/…
- if x and y are + …., the slope coeff is …
- if x and y are - …., the slope coeff is …
- x needs + ….. to determine an estimator
OLS summary
- the ESTIMATOR of the slope coeff B1 = COV(x,y) / VAR(x)
- if x and y are + CORRELATED the slope coeff is POSITIVE
- if x and y are - CORRELATED the slope coeff is NEGATIVE
- x needs + VARIANCE to determine an estimator
Is it better to have more of the structural or the stochastic term?
The more of the variables in the structral term the better
Variance decomposition
Total variance: actual yi - mean y
Explained v: predicted yi - mean y
Residual v : actual yi - predicted yi
Total = explained + residual
R square?
How well does the empirically tested model fot the data?
R-square = 1 - total var/residual var
Interval [0,1] thehigher the better
Assumptions of the simole regression model?
Needed
1-6
Whaz happens if not fulfilled?
- the linear model describes the relship between x &y
- sampe through random draw of the pop
- the expected value for the disturbance term is 0
- the iv is not constant
- u not corr with x
- x values have been measured accurately
If not fulfilled, results are biased
- Homocedasticity: u has constant variance, which ins not correlated with x
- error terms ui are normally distributed
Assumptions of simple regression model
Not needed
7-8
- homoscedasticity
Disturbance term u has const variance, which is not corr with x - error term u is normally distributed
Heteroscedasticity
effect
differing variance across all values of an IV
e.g. age with income
- -> OLS is biased and inefficient
- -> SE are distorted
Homoscedasticity
variance is the same across all values of IV
Dealing with heteroscedasticity
-
Solution
+ ols estimates are UNBIASED
- ols estimates are UNEFFICIENT
- SE lf the coeff estimates by OLS are DISTORTED
Solution : use robust estimates
Testing for significance of the coeff?
Is there a …. between x and y?
H0: is it ….. 0 ?
Distribution of predicted b1:
In large samples: b1 …. to a …. distribution
In small samples: b1 is ….. distributed
Testing for significance of the coeff?
Is there a RELATIONSHIP between x and y?
H0: is it UNEQUAL 0 ?
Distribution of predicted b1:
In large samples: b1 CONVERGES to a NORMAL distribution
In small samples: b1 is NORMALLY distributed
Multiple regression
- at least 2 … variables
- b0: measures of the …
- b1:
-b2:
B1+ B2 –> … …
“How …is the … size of x1, provided that x2 remains constant?”
Multiple regression
- at least TWO EXPLANATORY variables
- b0: measures the INTERCEPT
- b1: SLOPE of the linear relship x1&y
-b2: SLOPE of the linear relship x2&y
B1+ B2 –> EFFECT SIZE
“How LARGE is the EFFECT size of x1, provided that x2 remains constant?”