L8 - The Multivariate Regression Model Flashcards
How do you work out the residual sum of squares?
- Same as a bivariate model
Min RSS = Σt=1N(ui)2
So for example less take a 3 variable model of:
Yi= β1+β2Xi2+β3Xi3+ui
Min RSS = Σt=1N(Yi- β1(hat) -β2(hat)Xi2 - β3(hat)Xi3)2
How do we work out the RSS of a 3 variable regression model?
- By differentiating
- FOC conditions for this problem are:
dRSS/d(β1(hat)) = -2Σ(Yi- β1(hat) -β2(hat)Xi2 - β3(hat)Xi3)=0
dRSS/d(β2(hat)) = -2ΣXi2*(Yi- β1(hat) -β2(hat)Xi2 -β3(hat)Xi3)=0
dRSS/d(β3(hat)) =-2ΣXi3*(Yi- β1(hat) -β2(hat)Xi2 - β3(hat)Xi3)=0
This gives us a system of three equations in three unknownvariables. More generally, if:
Yi- β1+Σj=2k(βjXij)+ui
then the first order conditions will yield a system of k equationsin k unknown variables.
What are the Least-Square normal equations of the three variable regression model?
Yi= β1+β2Xi2+β3Xi3+ui
has least-squares normal equations of the form:
β1(hat)N +β2(hat)ΣXi2 + β3(hat)ΣXi3=ΣYi
β1(hat)ΣXi2 +β2(hat)ΣXi22 + β3(hat)ΣXi2Xi3=ΣXi2Yi
β1(hat)ΣXi3 +β2(hat)ΣXi3Xi2 + β3(hat)ΣXi32=ΣXi3Yi
This is a system of three equations in three unknowns. Solving these yields the OLS estimates.
What happens when you divide through the OLS estimator by N?
β1(hat)N +β2(hat)ΣXi2 +β3(hat)ΣXi3=ΣYi
β1(hat) +β2(hat)(ΣXi2/N)+β3(hat)(ΣXi3/N)=(ΣYi/N)
β1(hat) +β2(hat)Xi2(bar) +β3(hat)Xi3(bar)=Yi(bar)
- this is the regression line that passes through the mean of the data just like a bivariate regression model
What is the general equation to solve multivariate OLS estimators?
In general, let X be a matrix whose columns contain the data for the explanatory variables and let y be a vector containing the data for the dependent variable. The multivariate OLS estimator can be calculated as:
β(hat)=(X(Ubar)TX(Ubar)-1*X*(Ubar)Ty(Ubar)
This is k x 1 vector where k is the number of explanatory variables (including the intercept)
In order to calculate the OLS estimator in this way we must assume:
- k < N, where N is the number of observations
- The X variables are linearly independent.
What is a example of solving for an OLS estimator of a multivariate regression model?
variable(Ubar) = vector
- the X variable is derived from –> X(Ubar)TX(Ubar) = X(Ubar)-1
- X(Ubar)TY(Ubar) = the y matrix
When it possible to solve multivariate regression models with matrices?
only if X(Ubar)TX(Ubar) is invertible
This occurs when:
- The number of colums is < than the number of rows (K < N) where N is the number of observation and K is number of parameters
- Columns of X(Ubar) are linearly independent (there cant be perfect correlation between any of the X variables)
What does R-squared measure?
the fraction of the variance of the endogenous variable which is explained by the model
How can the regression model be wrote in matrix form?
Note that we can also write the regression model itself in matrix form.
y(Ubar)=X(Ubar)β + u
where β is a k x 1 vector of unknown population parameters and u is an N x 1 vector of unoberserable random errors
We can substitute for y in the expression for the estimator to obtain:
β(hat)= (X(Ubar)TX(Ubar))-1X(Ubar)T(X(Ubar)β + u)
= β+(X(Ubar)TX(Ubar))-1X(Ubar)Tu
The OLS estimator is therefore a linear combination of the random errors and is itself therefore a random variable whose distribution will depend on the properties of the errors
What are the Guass- Markov assumption in matrix form?
- E(ui)=0; i=1,…,N –> expect value of the error term is equal to zero for every value of i
- E(uiuj)=0; i≠j –> the expected value of uiuj=0, the errors are independent
- E(ui2)= σu2; i= 1,…,N –> the expected value of the squared error term is a constant σu2 –> the variance of the error
- E(X’u)=X’E(u)=0 –> if X is not random and is fixed, we can take it out of the expectation operation
and finally:
- the errors follow a normal distribution
These assumptions are essentially the same as for the bivariate regression model
What does assumption 2 and 3 tell you about the variance-covariance matrix of the errors?
IN –> identity matrix
What happens to a multivariate regression model when the X variables are correlated?
Yi= β1+β2Xi2+β3Xi3+ui
ρXi2=Xi3
This means we could write:
Yi= β1+β2Xi2+β3ρXi2+ui
Yi= β1+(β2 +ρβ3)Xi2+ui
The equation only contains one genuine independent variable and we therefore cannot estimate separate effects from the two variables
What happens to a multivariate regression model when we have only partial collinearity?
The previous example was one in which there was perfect collinearity between two variables.
In cases like this the least-squares method will not allow us to estimate separate coefficients for the two right-hand side variables.
More generally we might have less than perfect collinearity:
ρXi2 +εi=Xi3
where V(εi) ≠0 but is small relative to V(Xi2)
- a software package here will treat them both as two completely two different variables
If this is the case then it is possible to estimate separate effects using least-squares but the estimates may be very inaccurate.