w2 Flashcards
Simple Linear Regression
A single independent variable predicting a single dependent variable
linear equation
y = B0 + B1X + e
Error Term
variability of y that can’t be explained by the relationship between x and y; apart of the linear equation
Y hat
predicted outcome using the linear regression equation
Extrapolation
using a model to predict ŷ using an independent variable outside of the dataset used to create the model; bad
SSE
Sum of Squared Errors; difference between actual values and predicted values; bad.
SSR
Sum of Squared Regression; difference between predicted values and mean; good.
SST
Total Sum of Squares; difference between actual values and mean ; SSR + SSE.
Coefficient of Determination
R squared; SSR/SST; “The percentage of variability in relationship between independent and dependent variability explained by the independent variable.”
Multiple Regression Equation
y = B0 + B1X1 + B2X2 … + e
Multiple Regression
Multiple independent variables and one dependent variable.
Interpret coefficient of 0.31 for mileage onto car price
For every additional mile of mileage, car price increases by $0.31, ceteris paribus.
Adjusted R squared
Penalizes use of large datasets; R squared is naturely inflated with large datasets and this counteracts that effect.
3 tests to analyze correlation
- Check residuals plots; residuals (y) against each independent variable and the predicted values (x); should be random.
- Check that f-test is significant, rules out the null for the model.
- Check that t-test is significant, rules out the null for each independent variable.
Multicollinearity
problem of strong correlation between two independent variables; cannot exceed 70; fix is to take one of those matching variables out.