Multiple Linear Regression Flashcards
How to do OLS for a multiple regression model
Same - square residuals and add up
We don’t have to remember the actual estimator formula, just provide stata x₁ x₂ etc.
CLRA needs to be modified again for multiple regression model…
CLRA 1 - Written as
𝑌=𝛽₀ +𝛽₁𝑋₁ +𝛽₂𝑋₂ +…+𝛽k𝑋ki +𝜀i where β₀…βk are unknown parameters and εi is random
CLRA2: Error has EV of 0 E(ε|X₁,…Xk) but now conditional on all X variables (Xk)
CLRA 3: No regressors are constant, no exact linear relationships (no PERFECT collinearity i.e cov=-1 or 1)
CLRA 4: Errors are uncorrelated cov(εi) = 0 (can be correl just not perfectly)
CLRA5: Same finite variance var(ε|x₁,…,xk) = σ²
CLRA6: Normally distributed
εi|x₁,…,xk ~ N (0, σ²)
Why may an estimator fail (2)
If perfectly collinear (breaks CLRA3)
If sample size n is too small in relation to parameters being estimated. Fails if n < k+1
How do we work out the error variance σ² in multi
RSS/ n - (k+1)
(We are looking at a 3 variable model, so denominator would be n-3) (basically just n - amount of X variables!)
Same as error variance in bivariate except n-(k+1 instead of n-2.
Coefficient of determination (goodness of fit) for multiple regression models
Problem: as we add more variables, R² increases (since more of the variation in y becomes attributable to the regression line)
So how do we cover for this?
Adjust R² to Rbar² and penalise the inclusion of more explanatory variables
How is the adjusted R² expressed
Rbar² = 1- [(n-1)/n-(k+1)) x (1-R²)]
SO this accommodates for the inclusion of more variables, by only increasing when the variables add something important to the analysis !
Recall lowest variance of βhat₁… (standard error of an OLS estimator - we are estimating β₁ in this instance!)
Hint: includes goodness of fit statistic for x and z…
Uses FC6 or pg 3 formula for σ²(β₁) variance
var (βhat₁) = σ²/(1-R²zx) Σx²
R²zx is goodness of fit statistic (correl coefficient between x and z)
Now we can work out what affects the standard error of OLS estimators? (5)
Use the formula given on last page to test it out
Variance of error (σ²) - higher = higher variance of OLS estimator, so lower accuracy (dispersed)
Variation in X variable - If low, larger the standard error. (Hard to tell how variable contributes to the regression if X doesn’t change a lot e.g if schooling doesn’t vary much, hard to estimate contribution of variable) So high variation in X is good
Correlation of the variables (R²zx) - higher correlated, higher standard error so harder to work out how much each variable contributes indepently.
Sample size n - larger sample size, lower standard error.
Number of regressors k - more regressors increases standard error. (Higher degree of multicollinearity means lower degrees of freedom since n-(k+1) so less accurate.
CLRA 3 When are variables perfectly correlated (meaning not accurate) (2)
when one variable is a constant multiple of another
one independent variable can be expressed as an exact linear function of two or more of the other independent variables e.g x₁=x₂ , or x₁=x₂+x₃