Lecture 6: Sum of Squares Flashcards

1
Q

Sum of Squared errors (SSE)

A

We square the prediction errors, as this ensures that negative and positive errors don’t cancel each other out and makes it possible that we can add them all up to a positive number.

After we have done this, we can find the regression line that gives the smallest sum of squared errors, which gives us the best predictions across all participants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Residuals

A

Prediction errors; the difference between the actual values (the ones you observe) and the ones predicted by the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Goodness of fit

A

The Sum of Squared Errors describes the goodness of fit: how well the regression line describes the data. Small prediction errors imply good fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Null model

A

SSE is not on a meaningful scale, making it difficult to interpret. Therefore, we compare it to the null model (our baseline).

The null model describes the mean and the standard deviation of the outcome variable. It is a regression model without predictors: Yi = a + ei (the intercept plus the prediction error), where ei = N(0, SDy). Its only coefficient a (the intercept) is just the mean of Y. SD (the population’s standard deviation) is just the standard deviation of the outcome variable Y.

If you do not include predictors, the mean will be the prediction for every individual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Total Sum of Squares (TSS)

A

Tells us the average squared distance of individual observations to the mean. To calculate TSS, you take the difference between each data point and the mean of all data points, square that difference (to remove negative signs and emphasize large deviations), and then add up all these squared differences.

Formula: Sum (individual observations - mean)2

TSS is used along with the sum of squared errors (SSE) to calculate the R-squared, a measure of how well your regression model explains the variability in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Regression Sum of Squares (RSS)

A

Tells us how large the difference between the total sum of squares and the sum of squared errors is. In other words, how much of the total sum of squares is explained by the regression line.

It is the reduction that occurs by using the regression line to predict observations instead of just the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sum of Squares formulas

A

SSE (Sum of Squared Errors): TSS - SSR

TSS (Total Sum of Squares): RSS + SSE

RSS (Regression Sum of Squares): SST - SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Correlation

A

A standardised measure of the strength of linear association between 2 continuous variables

R = -1: perfect negative association
R = 0: no association
R = 1: perfect positive association

Correlation and regression are very closely related; the steeper the regression slope is, the stronger the correlation coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Standardised regression coefficient (SRC)

A

SRC = the correlation coefficient, but this changes when we have more than one predictor.

In standardized regression, the scores of your variables are transformed to have a mean of 0 and a standard deviation of 1. This process is called standardization.
The coefficients in standardized regression (often called beta coefficients) represent the number of standard deviations that the dependent variable (Y) is expected to change for a one standard deviation change in the independent variable (X).
This allows for a direct comparison of the strength of the relationships between different pairs of variables, regardless of their original units of measurement. It’s particularly useful when variables are measured on different scales.

The standardised regression coefficient is interpreted in terms of standard deviations: if the independent variable increases with one SD, the dependent variable will increase with … SD’s (based on the value of the SRC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

F-test/F-distribution

A

Ranges from zero to +infinity, never taking on negative values.

In the context of linear regression, an F-test is used to assess the overall significance of the regression model. It helps determine whether the model, with all its predictors, is statistically better than a model with no predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explained variance

A

The portion of the total variance explained by the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly