Ordinary Least Squares Flashcards
OLS
Why use Ordinary Least Squares?
1) OLS is relatively easy to use
2) The goal of minimizing sum of error squared is quite appropriate from a theoretical point of view
3) OLS estimates have a number of useful characteristics
OLS
is a regression estimation technique that calculates the estimated slope coefficients so as to minimize the sum of the squared residuals
An estimator is…
a mathematical technique that is applied to a sample of data to produce real-world numerical estimates of the true population regression coefficients (or parameters). So OLS is an estimator, and an estimated slope coefficient produced by OLS is an estimator.
What are the three reasons for using OLS?
1) It is the simplest of all econometric estimation techniques
2) Minimizing the summed, squared residuals is a reasonable goal of an estimation technique.
3) Has the following two useful characteristics:
a) The sum of the residuals is exactly zero.
b) OLS can be show to be the “best” estimator possible under a set of specific assumptions
K =
of independent variables
i =
goes from 1 to N and indicates the i-th observation of independent variable
the biggest difference between a single-independent variable regression model and a multivariate regression model is…
the interpretations of the multivariate regression model’s slope coefficients, often called partial regression coefficients, are defined to allow a researcher to distinguish the impact of one variable from that of other independent variables.
You should always include beta naught in a regression equation,…
but you should not rely on estimates of beta naught for inference
Total Sum of Squares (TSS) …
is the squared variations of Y around its mean as a measure of the amount of variation to be explained by the regression.
For OLS, the total sum of squares has two components…
1) Variation that can be explained by the regression
2) Variation that cannot be explained by the regression
TSS = ESS + RSS
this is usually called the decomposition of variance
Decomposition of the variance in Y…
The variation of Y around its mean (Y-Yavg) can be decomposed into two parts:
1) (Y-obs - Y-avg) = the difference between the estimated value of Y(Y-hat) and the mean value of Y (Y-avg)
2) (Yi-Yavg) = the difference between the actual value of Y and the estimated value of Y.
Explained Sum of Squares (ESS) = …
Measures the amount of the squared deviation of Yi from its mean that is explained by the regression line. It is attributable to the fitted regression line
Residual Sum of Squares (RSS) = …
This is the unexplained portion of TSS (that is, unexplained in an empirical sense by the estimated regression line)
The smaller the RSS is relative to the TSS…
the better the estimated regression line fits the data.
OLS is the estimating technique that…
minimizes the RSS and therefore maximizes the ESS for a given TSS.
Once the computed estimates have been produced, what are some questions that an econometrician should ask…
1) Is the equation supported by sound theory?
2) How well does the estimated regression fit the data?
3) Is the data set reasonably large and accurate?
4) Is OLS the best estimator to be used for this equation?
5) How well do the estimated coefficients correspond to the expectations developed by the researcher before the data were collected?
6) Are all the obviously important variables included in the equation?
7) Has the most theoretically logical functional form been used?
8) Does the regression appear to be free of major econometric problems?
R^2…
is the ratio of the explained sum of squares over the total sum of squares
A major problem with R^2 is…
that adding another independent variable to a particular equation can never decrease R^2
Degrees o freedom can also be understood as…
The excess numbers of observations (N) over the number of coefficients (including the intercept) estimated (K + 1)
R-bar-squared…
is R^2 adjusted for degrees of freedom
When is R^2 of little help?
If we’re trying to meaningfully explain if adding a variable to an equation improves our ability to explain the dependent variable.
R-squared-bar measures….
the percentage of the variation of Y around its mean that is explained by the regression equation, adjusted for degrees of freedom.
R-squared bar can be used to…
compare the fits of equations with the same dependent variable and different numbers of independent variables. Because of this property, most researchers automatically use R-squared-bar in stead of R-squared when evaluating the fit of their estimated regression equations.
Adding a variable can’t change TSS, but…
but in most cases the added variable will reduce RSS, so R^2 will rise
The variation of Y around its mean can be decomposed into two parts…
1) The difference between the estimated value of Y and the mean value of Y
2) The difference between the actual value of Y and the estimated value of Y
OLS selects those estimates of beta naught and beta one that…
minimale the squared residuals, summed over all the sample data points.
OLS is a regression estimation technique that calculates…
betas so as to minimize the sum of the squared residuals.