Regression Flashcards
Regression
Using correlation (r), you can roughly predict one variable value based on another
Regression uses
Prediction (College admissions, car insurance rates, dating sites, health insurance) estimation, hypothesis testing, modeling causal relationships
Predictor Variable (X)
information you have
Criterion Variable (Y)
to-be-predicted variable can be estimated within a certain degree of certainty given known values
X & Y Synonyms
X: Predictor Variable, Independent Variable, Explanatory Variable
Y: Criterion Variable, Dependent variable, response variable, outcome variable
Best-fit line
A line that is an equal distance from all points on a scatterplot. A line that best fits the data
Regression Questions:
- Is a pattern evident in a set of data points?
- Does the equation of a straight line describe this pattern?
- Are the predictions made from this equation significant?
Sum of Squares (SS) for regression
The sum of the squared distances of data points from a straight line (gives more information than basic deviation scores, that do not capture the degree of variability from the data points to the line)
Sum of deviation
The sum of deviation scores from best fit line always equals zero, does not measure spread
Sum of Squares
Squaring each deviation value captures spread from data points to regression line
Equation of a line (slope)
Y=bX+a (Y=Mx+b)
Slope
B(measures the change in y relative to the change in x)
Intercept
a (y when x equals 0)
Method of least squares
Method to compute slope and y-intercept of the best fitting straight line to a set of data
Formula for Least Squares
- Calclulate SSxy, SSx, SSy
- b=SSxy/SSx
- a=My-(b)(Mx)
- Yhat=bx+a
Regression Line
The straight line that minimizes total distance of all data points in the correlation
regression analysis
A methodology for testing regression hypotheses concerning whether predictor variables x can predict outcome variables y
Regression variation
A value representing the variance in Y associated with changes in X; measured by distance of data points from the regression line
resitual variation
The variation in Y unrelated to X; the remaining variation
F obtained for regression analysis
The variation of Y related to changes in x/variance of y not related to changes in x
Fobt Formula for Regression analysis
MSregression/MSresidual
Sum of Squares regression
The coefficient of determination times the sum of squares for Y
r formula (regression)
SPxy/sqrtSSx*SSy (SP is sum of products, (X-Mx)(Y-My))
SS residual formula
SSresid=(1-r^2)SSy
SS residual
The Sum of Squares of y times the remaining variance not predicted by r^2, that is, 1-r^2
Alternate formula for Sum of Squares Y
SSy=SSregression + SSresidual)
Mean Square Regression (MSregression)
SSregression/dfRegression
Mean Square Residual (MSresid)
SSresidual/DFresidual
DF numerator (predictor) for regression
Equal to the number of predictor variables
DF denominator (residual) for regression
Sample size-2
Hypotheses structure for regressions
H0: variance for Y is unrelated to variance for X
H1: variance for Y is related to variance for X
Multiple Regression
A method for predicting Y when two predictors (Xs) are present
Multiple Regression formula
Yhat=b1X1+b2X2+a
Yhat=b1X1+b2X2+b3X3+a
And so on