Linear Regression Flashcards
What is correlation used for?
measuring the strength and direction of a linear relationship
What is regression used for?
describing the linear relationship with an equation for other groups or for other situations
When do you use regression?
when only data on the independent variable is known
what variable is being predicted in single linear regression?
the dependent variable, y
what variable is used to make predictions from in single linear regression?
the independent variable or the predictor, X
How do you determine the equation of the best fitting line?
a technique called least squares regression
what are residuals?
the difference between the observed value of y and the predicted value of y (point on the line)
what do the residuals look like when the line fits the data well?
they are small
what regression equation is the best?
the one with the smallest sum of squared residuals
how do you measure the accuracy of the predictions?
standard error of the estimate AKA root mean squared error (RMSE)
what is the RMSE?
root mean squared error, the average error we make when using the regression equation to make predictions
what is the regression equation?
y = b0 + b1x where b0 is the y-intercept and b1 is the slope
what is the coefficient of determination?
R^2 = the percentage variance or measure of how well the line represents the data?
what can you do to test if the linear relationship is a significant relationship?
test for the slope (t-test) and test for explained variance (f-test)
what does the t-test test?
test for the slope = if the slope differs from 0
what are the hypotheses for the t-test?
H0: B1 = 0
HA: B1 ≠ 0
What does the f-test (test for explained variance) test for?
if the proportion of the variation that is explained by the model is significantly greater than 0
What are the hypotheses for the f-test?
H0: p^2 = 0
HA: p^2 > 0
what are the unstandardized regression coefficients?
b0 and b1, these are expressed in the same units as the dependent variable
what is the standardized regression coefficient?
coefficient beta, it is equal to the correlation coefficient r
What does the coefficient beta measure?
the change in Y in SD’s when X increases by 1 SD
when is the coefficient beta useful?
- when the units of measurement of X and Y differ dramatically
- when we use more than 1 independent variable to predict Y and we wish to compare the importance of them
What does multiple linear regression allow for?
adding multiple independent variables to predict the dependent variable
what is the equation for MLR?
y = b0 + b1x1 + b2x2 … + bkxk
what does adding more independent variables to a model (MLR) do?
- explain more of the variation of the dependent variable (R^2 will increase)
- reduce the average prediction error (the SE decreases > accuracy of prediction increases
how can you find the significance of individual predictors?
using the t-test
which predictor in MLR has the most impact on changes in the Y-variable?
the one with the largest standardized coefficient
what are the assumptions for linear regression?
- normality (use histogram)
- homoscedasticity
- linearity
- no outliers (between -3 and 3)