Classic linear regression model Flashcards
What are the 5 assumptions of the classic linear regression model (CLRM)?
Since yt depends on ut, we must be specific about how ut are generated. We make the following assumptions about the unobservable error terms, ut:
1) The errors have zero mean.
2) The variance of the errors is constant and finite over all values of xt.
3) The errors are statistically independent of one another.
4) No relationship between the error and corresponding x variate. i.e., the xt are non-stochastic (not random/ not influeced by any random factors).
5) ut are normally distributed.
If the assumptions of the CLRM hold, then what are the estimators for β and α determined by OLS known as?
They are the Best Linear Unbiased Estimators (BLUE) of α and β.
Which theorem proves that the OLS estimators are the best linear unbiased estimators?
The Gauss-Markov theorem (provided that the assumptions are fulfilled).
Why do the first four assumptions need to be made about the error terms of the CLRM?
In order to prove that the OLS estimators are the best linear unbiased estimators (BLUE) of α and β.
What is meant by ‘best’ in ‘best linear unbiased estimator’?
What is meant by an ‘efficient’ estimator?
“Best” - The OLS estimator B̂ has minimum variance among the class of linear unbiased estimators.
“Efficient” - An estimator B̂ of parameter β that is unbiased and no other unbiased estimator has a smaller variance (it’s the “best”).
What is meant by ‘unbiased’ in ‘best linear unbiased estimator’?
E(â) = α and E(B̂) =β. I.e., on average the value of the â and B̂ estimators will be equal to the true values (α and β). To prove this also requires the assumption that E(ut) = 0 and Cov(ut,xt) = 0.
What happens if the assumptions of the CLRM are violated?
The OLS estimators are no longer unbiased or ‘efficient’. That is, they may be inaccurate or subject to fluctuations between samples.
Why does the fifth assumption need to be made about the CLRM?
The assumption that the disturbance terms are normally distributed needs to be made in order to make statistical inferences about the population parameters from the sample data, i.e., to test hypotheses about the coefficients. Making this assumption implies that test statistics will follow a t-distribution (provided that the other assumptions also hold).
What are the indicators of near multicollinearity and what is it?
This is where the individual regressors are very closely related, so that it becomes difficult to disentangle the effect of each individual variable upon the dependent variable.
The regression parameters are all individually insignificant (i.e., not significantly different from zero) (as shown by low t-ratios (high standard error)), although the value of R2 and its adjusted version are both very high, so that the regression taken as a whole seems to indicate a good fit.
What are methods to solve near multcollinearity?
Since the problem is really one of insufficient information in the sample to determine each of the coefficients, then one should go out and get more data. In other words, we should:
- switch to a higher frequency of data for analysis (e.g., weekly instead of monthly, monthly instead of quarterly, etc.)
- get more data by using a longer sample period (i.e., one going further back in time
- combine the two independent variables in a ratio (e.g., x2t / x3t)
Other methods:
- Ignore it: if the model is otherwise adequate
- Drop one of the collinear variables – so that the problem disappears.