PE_L2_(without Appendix) Flashcards
What is the Simple Linear Regression (SLR) population model?
- Model form: y = β0 + β1x + u
- y: dependent (outcome) variable, consequence
- x: independent (explanatory) variable, cause
- u: error (disturbance) term
“population model”
We want to find causal relations (“! causes “”), not (mere) correlations!
Which assumptions define the SLR framework?
- SLR.1: Model is linear in parameters (beta)
- SLR.2: Random sampling
- SLR.3: Sample variation in x
-
SLR.4: Zero conditional mean (E(u|x)= E(u)=0)
E ist der Erwartungswert - SLR.5: Homoskedasticity (Var(u|x)=σ2)
What does ‘zero conditional mean’ imply?
- E(u|x) = 0
- The error term u does not systematically depend on x
- Guarantees no omitted-variable bias if included variables capture relevant effects
Why is sample variation in x important?
- Prevents the denominator in slope formulas from being zero
- Ensures x has enough variability to estimate β1
- Without it, the slope cannot be computed
How do we interpret the OLS estimates?
- β1 (slope): Estimated change in y for a one-unit change in x
- β0 (intercept (value of : when & is zero)): Estimated value of y when x=0
- Both are unbiased if assumptions hold
What is the idea of ‘least squares’?
- Minimise the sum of squared residuals: ∑(yi − ŷi)2
- Ensures the fitted line is as close as possible to the data points (in a vertical sense)
What is R2?
- Coefficient of determination
- R2 = SSE / SST = 1 − SSR / SST
- Measures the proportion of the total variation in y explained by the model
- Goodness of fit
What does homoskedasticity mean?
- Var(u|x) = σ2 (constant)
- The spread of the error term does not depend on x
- A key assumption for deriving simple variance formulas
How is the error variance estimated?
- Using the residuals: ûi = yi − ŷi
- σ̂2 = SSR / (n − 2)
- (n − 2) because two parameters (β0, β1) are estimated from the data
How can we handle non-linear relationships?
- ‘Linear in parameters’ does not require a strictly straight-line in x
- Can use log forms: ln(y), ln(x)
- Interpretation changes: e.g., log-log model implies elasticity interpretation
What is the unobserved error term?
- Strictly unpredictable random behaviour that may be unique to that observation
- Unspecified (unobserved) factors / explanatory variables, which are not in the model
- An approximation error if the relationship between : and & is not exactly linear
OLS
Ordinary Least Squares
Estimate β0 + β1
Step 1:
E(u) = 0
E(u|x) = 0
??
Step 2:
??
Implications of Simple Linear Regression (SLR) Model
Implication 1: OLS is unbiased (because of SLR1-4)
Implication 2: Properties of variance of the OLS Estimators