17. Regression Flashcards
DIFFS BETWEEN corr/coeff
Regression is ASYMMETRIC - predicting one variable from one another
Cor. is symmetric
Regression uses relationship to predict Y from X
X/Y in regression
X = explanatory
Y = response
alpha and beta
alpha = y intercept
beta = slope
estimates = a and b!
Y hat
value of Y predicted by value of X
Residuals
Yi - Yi(hat)
How do you decide which line is the BEST fit?
Least squares regression line!!!
Find a line that minimizes the sum of squares of residuals
SSresiduals
sum from i = 1 to n of [Yi - Yi(hat)]^2
What difference in variation is worrying?
more than a 10 fold difference in variance
why shouldn’t you fit a polynomial w/ too many terms?
wouldn’t predict new data points!!
sample size at least 7 times number of terms!
Confidence bands
telling us uncertainty for predicting an average value of y for a given value of x
which is broader: prediction of where a line is or prediction of an individual?
individual - more uncertainty
Prediction interval
uncertainty in estimating an individual value
3 methods of fitting non linear relationships
- transformations on data (ex log of both sides)
- Quadratic regression
- Splines
Assumptions of Linear Regression + how to see on residual plot
- random sample - cant see :(
- Y normally distrib for all values of X - centered around line, tapers off
- Y approx equal variance for all values of X - dots extend to similar points around the line
problem with overfitting data
very poor predictive power for any new data points!!!!!