Regression Flashcards
Variable types
Independent/explanatory- this is the one we control
Dependent/response- this is the one we measure
Residuals
It is the vertical distance of a datapoint to the line of best fit
Residual= Observed- expected
Least squares regression lie
y=a+bx
A line of best fit such that-
Sum of residuals= 0
Sum of squares of residuals is as small as possible
Menu 6-2 OPTN 3
Interpretation of a and b
a- expected value of y when x is 0
b- expected change in y for every 1x
Outliers and Anomalies
An outlier is any data pout with a residual of more than 3 s.d of y.
Do not remove outliers unless they are anomalies which are proven to not belong
How to predict it
Either sub x into the equation
Or, use ŷ (OPTN down 4 5)
Reliability
Interpolation- predict within the data (reliable)
Extrapolation- predict outside data
(not reliable)
Small residuals- reliable
Large residuals- not reliable