Regression Flashcards
The least-squares regression line is
the unique line such that the sum of the vertical distances between the data points and the line is zero, and the sum of the squared vertical distances is the smallest possible.
Y hat is the
(in east-squares regression line)
predicted y value on the regression line
The slope of the regression line describes
- how much we expect y to change, on average, for every unit change in x.
How to calculate The slope of the regression line
intercept of the regression line how to calculate
intercept of the regression line is a
necessary mathematical descriptor of the regression line. It does not describe a specific property of the data.
The regression line always passes through
the mean of x and y
Least-squares regression is only for
linear associations
Don’t compute the regression line until you have confirmed that there is a linear relationship between x and y. - ALWAYS PLOT THE RAW DATA
SSE vs SSR vs SST
1 - (SSE/SST) = R^2
r^2 is the ____
the coefficient of determination, is the square of the correlation coefficient
represents the fraction of the variance in y that can be explained by the regression model.
If all varabylity can be explained by the line R^2 =
1
Outlier vs Influential individual”
Outlier: An observation that lies outside the overall pattern. (it is unusually far from the regression line, vertically). - large residual
“Influential individual”: An observation that markedly changes the regression if removed. This is often an isolated point.
residuals
The vertical distances from each point to the least-squares regression line are called residuals.
The sum of all the residuals is by definition 0.
Outliers have unusually large residuals (in absolute value).
pos vs neg residual
pos - underestamation
neg - over estamation
Use the equation of the least-squares regression to predict
y for any value of x within the range studied.