Linear regression Flashcards
SSR, MSR, R SQ
How to quantify the quality of a model and its predictions?
By calculating Sum of Squared Residuals.
How do you calculate the sum of squared residuals?
- First calculate the residuals by finding the differences between observed and predicted values.
- Then square the residuals and sum up the squared residuals.
Sum of squared residuals formula
SSR = Sigma(observed - predicted) ** 2
What kind of models can we apply SSR
All kinds of models - linear or curve
How do we calculate the residuals - vertical or perpendicular distance to the model
By calculating the vertical distance
Perpendicular distance to the model is also called as
Shortest distance.
Why do we use vertical distance instead of the shortest distance
Since the perpendicular or the shortest distance doesn’t give the correct values on x
What is the problem of SSR?
SSR is not easy to interpret since it depends on the amount of data we have. For example - For three data points, SSR is 14. For 5 data points, SSR is 22. It doesn’t imply that the second model is worst than first. Higher the data, worse the result. It only tells us that the model with more data has more residuals.
Should SSR be low or high
The smaller the value of SSR, the better the model fits the data. If SSR is zero, the model fits perfectly to the data.
How to compare two models that may fit to different sized datasets is to calculate
Mean Squared Error
Formula for Mean squared error
SSR/number of observations
sigma(observed - predicted) ** 2/ n
What does MSE calculate intutively
Average of residual, so MSE is present than SSR which increases when we add more data.
Why are MSEs difficult to interpret
When comparing two models, the values depend on the scale that is used in the models. One model using mm has MSE 4.7 while the other model using meters has MSE 0.0000047
How to overcome the disadvantage of MSE
Using R squared
How R squared overcomes the issue with MSE
R squared is independent of both size of the dataset and scale