5 - Introduction to Regression Flashcards
What is regression?
A way of predicting the value of one variable from another
What is a regression model?
A hypothetical model of the relationship between 2 variables (usually, both are interval scale variables)
What is linear regression?
A linear approach to modeling the relationship between 2 variables using the equation of a straight line
What is the Ordinary Least Squares approach?
An approach that minimizes the error of the sum of squares between the predictions made by the model, and the observed data (minimizing the sum of squares).
Why is it important to test the model?
The regression model is only based on the data, which means that the model may not reflect reality. We need to test how well the model fits the observed data and how well it predicts on new data.
What is the sum of squares used for?
To tell us how well the model fits the observed data.
What are the 3 types of sum of squares?
Total Sum of Squares, Residual Sum of Squares, Model Sum of Squares
What does the Total Sum of Squares (SST) tell us?
Variability between the observed dependent variable and its mean.
(aka. the total variance in the data).
What does the Residual Sum of Squares (SSR) tell us?
Variability between the regression model and the actual data.
What does the Model Sum of Squares (SSM) tell us?
The model variability, so the difference in variability between the model and the mean
(aka. how well our line fits the data)
How are SST, SSR, and SSM related?
SSM = SST - SSR
How is ANOVA used in Linear Regression?
An ANOVA test can be used to evaluate a regression model.
What should we expect if a model is a better tool for predicting than using the mean?
SSM should be much greater than SSR
MAKE A QUESTION ABOUT THE MEAN SQUARED ERROR
F = (Mean Squared Error
How is R^2 used in Linear Regression?
R^2 (Pearson Correlation Coefficient Squared) tells us the proportion of variance accounted for by the regression model. In other words, what variability in the dependent variable is explained by the independent variable.