3.4: Introduction to Linear Regression Flashcards
Correlations
Measures the strength and direction of a linear relationship between two variables.
What variable is used to represent correlations?
R
What is the unit of R?
There are no units
What does the value of R help show?
The effect size
What does R being unitless allow us to do?
Compare correlations across measures of different scales
What are correlations dependent on?
2 variables
What are 2 flaws of correlations?
- Does not account for third (confounding) variables
- Do not help predict the actual values of the variables in a new situation (only given situation).
Regression
Explores how one variable (dependent) changes in response to another (independent).
Least Squares Regression Equation
Yi = bXi + a (slope formula)
Yi = predicted value
b = estimated slope
a = estimated intercept
How are errors in prediction based on the regression line minimized?
Based on the least squares method
How does the least squares method minimize error?
Minimizes the squared deviations from the regression line
Residuals
Deviations around the line of best fit
What do you assume in residuals?
Homoscedasticity
Homoscedasticity
The error in predictions (scatter around line) is evenly distributed across the range of x and y
Homogeneity
The idea that the spread of residuals is constant across all levels of the independent variable.
Is regression or correlation model more reliable?
regression
Heterogeneity
The variance of the residuals changes across levels of the independent variable.
What can heterogeneity lead to?
Biased estimates
What does the graph of heterogeneity tend to look like?
It funnels (wider in some spots then narrows out)
What does the regression equation describe?
The numeric relationship between the variables in the graph.
What does the line of best fit help predict?
Scores of one variables given the scores of another variable.
What can regression equations be expanded to?
Multiple variables
Q: In the linear regression equation, Y =bX + a, what is the value of b called?
Best fit line
Beta (slope)
X intercept
Correlation between X and Y
Beta (Slope)
Q: If there is a negative correlation between X and Y then the linear regression equation Y = bX + a, would necessarily have…?
A<0
A>0
B<0
B>0
B<0
Multivariable Linear Regression
Models the relationship between one dependent variable and two or more independent variables.
What does the equation for multivariable linear regression look like?
Y = b1X1 + b2X2 + b3X3… + a
Coefficient
The beta (slope) value in units of the variables.
Standard Coefficient
The beta (slope) value in z-scores units.
What does using the standard coefficient do?
Makes the slope values comparable across all the predictor variables (no units).
Continuous Variables
Betas represent slopes.
Group Variables
Betas represent difference scores to a reference level.
What do regression lines help predict?
Y scores from X scores
Predictions are rarely___
perfect
How is residual deviation from the predicted regression line summarized?
By the squared deviations from the predicted values on the line.
Root Mean Squared Error
Provides a measure of the deviation of a point away from the regression line.
Root Mean Squared Error Formula
Square root of: Sum of deviations squared ÷ total samples
Without X, what is the best prediction method for y?
Finding the distance from the mean
What question for Root Mean Squared Error help answer?
After we use X to predict Y, how much variability is left in Y?
What 2 questions does deviation squared answer?
- How much variability in Y did the regression explain with X?
- How much better did we do by using X instead of simply the mean of Y?
What is the formula for variability explained?
SSy (total variability in Y) - SSy/x (unexplained variability)
How do you find the unexplained variability?
Sum of squared deviations
What is R^2?
Coefficient of Determination
What is the formula for4 R^2?
( SSy - SSy/x ) ÷ SSy
Coefficient of Determination
The proportion of explained variability to total variability
What does R^2 indicate?
The proportional gain in variability accounted for predicting y from x, rather than from the mean.
What does R^2 = 0 mean?
The independent variables in the model explain none of the variation in the dependent variable.
= Model is useless for prediction
What does a graph of R^2 = 0 look like?
No correlation
- scatter around the center with no pattern
What does R^2 = 1 mean?
The model perfectly explains the variation in the dependent variable.
= Model is perfect; every point falls on the regression line.
What does a graph of R^2 = 1 look like?
All points fall on the regression line, whether it is a positive line or a negative.