Lecture 6: Sum of Squares Flashcards

Question 1

Q

Sum of Squared errors (SSE)

Answer

A

We square the prediction errors, as this ensures that negative and positive errors don’t cancel each other out and makes it possible that we can add them all up to a positive number.

After we have done this, we can find the regression line that gives the smallest sum of squared errors, which gives us the best predictions across all participants.

Question 2

Q

Residuals

Answer

A

Prediction errors; the difference between the actual values (the ones you observe) and the ones predicted by the model

Question 3

Q

Goodness of fit

Answer

A

The Sum of Squared Errors describes the goodness of fit: how well the regression line describes the data. Small prediction errors imply good fit.

Question 4

Q

Null model

Answer

A

SSE is not on a meaningful scale, making it difficult to interpret. Therefore, we compare it to the null model (our baseline).

The null model describes the mean and the standard deviation of the outcome variable. It is a regression model without predictors: Yi = a + ei (the intercept plus the prediction error), where ei = N(0, SDy). Its only coefficient a (the intercept) is just the mean of Y. SD (the population’s standard deviation) is just the standard deviation of the outcome variable Y.

If you do not include predictors, the mean will be the prediction for every individual.

Question 5

Q

Total Sum of Squares (TSS)

Answer

A

Tells us the average squared distance of individual observations to the mean. To calculate TSS, you take the difference between each data point and the mean of all data points, square that difference (to remove negative signs and emphasize large deviations), and then add up all these squared differences.

Formula: Sum (individual observations - mean)2

TSS is used along with the sum of squared errors (SSE) to calculate the R-squared, a measure of how well your regression model explains the variability in the data.

Question 6

Q

Regression Sum of Squares (RSS)

Answer

A

Tells us how large the difference between the total sum of squares and the sum of squared errors is. In other words, how much of the total sum of squares is explained by the regression line.

It is the reduction that occurs by using the regression line to predict observations instead of just the mean.

Question 7

Q

Sum of Squares formulas

Answer

A

SSE (Sum of Squared Errors): TSS - SSR

TSS (Total Sum of Squares): RSS + SSE

RSS (Regression Sum of Squares): SST - SSE

Question 8

Q

Correlation

Answer

A

A standardised measure of the strength of linear association between 2 continuous variables

R = -1: perfect negative association
R = 0: no association
R = 1: perfect positive association

Correlation and regression are very closely related; the steeper the regression slope is, the stronger the correlation coefficient.

Question 9

Q

Standardised regression coefficient (SRC)

Answer

A

SRC = the correlation coefficient, but this changes when we have more than one predictor.

In standardized regression, the scores of your variables are transformed to have a mean of 0 and a standard deviation of 1. This process is called standardization.
The coefficients in standardized regression (often called beta coefficients) represent the number of standard deviations that the dependent variable (Y) is expected to change for a one standard deviation change in the independent variable (X).
This allows for a direct comparison of the strength of the relationships between different pairs of variables, regardless of their original units of measurement. It’s particularly useful when variables are measured on different scales.

The standardised regression coefficient is interpreted in terms of standard deviations: if the independent variable increases with one SD, the dependent variable will increase with … SD’s (based on the value of the SRC)

Question 10

Q

F-test/F-distribution

Answer

A

Ranges from zero to +infinity, never taking on negative values.

In the context of linear regression, an F-test is used to assess the overall significance of the regression model. It helps determine whether the model, with all its predictors, is statistically better than a model with no predictors.

Question 11

Q

Explained variance

Answer

A

The portion of the total variance explained by the regression line

Lecture 6: Sum of Squares Flashcards

(11 cards)