Linear regression Flashcards by Sindhusha Boyapati

How to quantify the quality of a model and its predictions?

By calculating Sum of Squared Residuals.

How well did you know this?

Not at all

Perfectly

How do you calculate the sum of squared residuals?

First calculate the residuals by finding the differences between observed and predicted values.
Then square the residuals and sum up the squared residuals.

How well did you know this?

Not at all

Perfectly

Sum of squared residuals formula

SSR = Sigma(observed - predicted) ** 2

How well did you know this?

Not at all

Perfectly

What kind of models can we apply SSR

All kinds of models - linear or curve

How well did you know this?

Not at all

Perfectly

How do we calculate the residuals - vertical or perpendicular distance to the model

By calculating the vertical distance

How well did you know this?

Not at all

Perfectly

Perpendicular distance to the model is also called as

Shortest distance.

How well did you know this?

Not at all

Perfectly

Why do we use vertical distance instead of the shortest distance

Since the perpendicular or the shortest distance doesn’t give the correct values on x

How well did you know this?

Not at all

Perfectly

What is the problem of SSR?

SSR is not easy to interpret since it depends on the amount of data we have. For example - For three data points, SSR is 14. For 5 data points, SSR is 22. It doesn’t imply that the second model is worst than first. Higher the data, worse the result. It only tells us that the model with more data has more residuals.

How well did you know this?

Not at all

Perfectly

Should SSR be low or high

The smaller the value of SSR, the better the model fits the data. If SSR is zero, the model fits perfectly to the data.

How well did you know this?

Not at all

Perfectly

How to compare two models that may fit to different sized datasets is to calculate

Mean Squared Error

How well did you know this?

Not at all

Perfectly

Formula for Mean squared error

SSR/number of observations
sigma(observed - predicted) ** 2/ n

How well did you know this?

Not at all

Perfectly

What does MSE calculate intutively

Average of residual, so MSE is present than SSR which increases when we add more data.

How well did you know this?

Not at all

Perfectly

Why are MSEs difficult to interpret

When comparing two models, the values depend on the scale that is used in the models. One model using mm has MSE 4.7 while the other model using meters has MSE 0.0000047

How well did you know this?

Not at all

Perfectly

How to overcome the disadvantage of MSE

Using R squared

How well did you know this?

Not at all

Perfectly

How R squared overcomes the issue with MSE

R squared is independent of both size of the dataset and scale

How well did you know this?

Not at all

Perfectly

How is R squared calcualted?

Study These Flashcards

R squared is calculated by comparing the SSR/MSE around the mean y-axis value. Compare this to SSR/MSE around the model we are interested in. Therefore R squared gives the percentage of how much the predictions improved by using the model instead of just mean.

What is the range of R squared values

Study These Flashcards

0 to 1

When R squared is closer to one it means

Study These Flashcards

The model fits the data better than using the mean y-axis value.

R squared formula

Study These Flashcards

SSR(mean) - SSR(fitted_line)/SSR(mean)

SSR(mean) - SSR(fitted_line) - what does it mean

Study These Flashcards

Tells us the percentage the residuals around the mean shrank when we used the fitted line.

Rsquare = 1 means

Study These Flashcards

Fitted line fits data perfectly

Rsquare = 0 means

Study These Flashcards

SSR(mean) = SSR(fitted_line) - they are both equally good or bad

SSR(fiited_line) = 0 mean

Study These Flashcards

Fitted line fits data perfectly

In what scenarios does Rsquared results have low confidence

Study These Flashcards

Small amount of data can have high (close to 1) R squared. Anytime we see trend in a small dataset, it is difficult to have confidence that a high R squared value is not due to random chance.

When does R squared result have high confidence

When there is large amount of data.

Is intuition only way to have confidence in R squared results?

No, having large data intuition is not enough. So, statisticians developed p-values.

R squared formula using MSE

MSE(mean) - MSE(fitted_line)/MSE(mean)

Does R squared always compare the mean to a straight fitted line?

The most common way to calculate R squared is to compare mean to a fitted line. We can calculate R squared to compare square wave to sine wave.

Linear regression Flashcards

SSR, MSR, R SQ (29 cards)