Session 2: Bivariate Regression: Review of Ordinary Least Squares, Multiple Regression Flashcards
OLS estimator
The OLS estimator minimizes the average squared difference between the actual values of Yi and the prediction (predicted value) based on the estimated line.
measures the spread of the sampling distribution of B-hat1
OLS estimator chooses the regression coefficients so that the estimated regression line is as close as possible to the observed data, where closeness is measured by the sum of the squared mistakes made in predicting Y given X.
Ordinary Least Squares
the slope and intercept of the line relating X and Y can be estimated by a method called Ordinary Least Squares
Yi = Βo + B1Xi + ui
Yi = (indepedent/dependent) variable
Xi = (indepedent/dependent) variable
Bo + B1x =
Bo = (intercept/slope) of pop regress line
B1 = (intercept/slope) of pop regress line
ui =
Yi = dependent variable, regressand, left hand variable
Xi = independent variable, regressor, right hand variable
Bo + B1x = pop regression line or (PRF) population regression function, this is the relationship that holds between Y and X over a population
Bo = intercept of pop regress line
B1 = slope of pop regress line
ui = error term
error term
incorporates all of the factors responsible for the difference between the ith district’s average test score and the value predicted by the pop regress line. it contains all other factors besides x that determine the value of the dependent variable y for a specific observation i.
OLS Regression Line
aka sample regression line, sample regression function
is the straight line constructed using the OLS estimates: B-hat0 and B-hat1*X.
The residual for the ith observation is the difference between Yi and its predicted value: Yi - Y-hati
Test score-hat = 689.9 - 2.28*STR (student teacher ratio)
What does STR coefficient mean?
The slope of -2.28 means that an increase in the student-teacher ratio by one student per class is, on average, associated with a decline in districtwide test scores by 2.28 point on the test.
Negative slope indicates that more students per teacher (largest classes) is associated w/ poorer test performance.
R^2 and Standard Error measure:
R^2 and Standard Error: measure how well OLS regression line fits the data.
R^2 ranges between ___ and ___ and measures:
SE of the regression measures:
R^2 ranges between 0 and 1 and measures: the fraction of the variance of Yi that is explained by Xi.
SE of the regression measures: how far Yi typically is from its predicted value.
regression R^2
is the fraction of the sample variance of Yi explained (or predicted) by Xi.
R^2
= ESS / TSS
= Explained Sum of Squares / Total Sum of Squares
= sum of squared deviations of the predicted values of Yi, Y-hati from their avg / sum of squared deviations of Yi from its average
OR can also be: the fraction of the variance of Yi not explained by Xi
R^2 = 1 - (SSR/TSS)
ESS
TSS
SSR
SER
ESS Explained Sum of Squares
TSS Total Sum of Squares
SSR Sum of Squared Residuals
SER Standard Error of the Regression
Standard Error of the Regression
???
R^2 of 0.051 means that….
the regressor student-teacher ratio explains 5.1% of the variance of the dependent variable testscore
SER of 18.6 means that
SE of Regression. Means that there is a large spread of the scatterplot around the regression line as measured in points on the test. this means that the predictions of test scores using only STR variable will often be wrong by a large amount.
t =
t = estimator - hypothesized value / SE of the estimator
test of the Ho against 2-side altnernative steps
- compute the SE of Y-bar
- compute the t-statistic t = (Y-bar - meany,0 / SE(Y-bar)
- compute the p-value. which is the smallest significance level at which the Ho could be rejected, based on tobserved. ALSO, probability of obtaining a statistic, by random sampling variation, at least as different from the Ho value as is the statistic actually observed, assuming Ho is correct