The Linear Model Flashcards
What is the residual (ei) of a model?
The (vertical) difference between the estimation and real value. ei = yi -ŷi
What does it mean when the residual (ei) is lower than 0?
The model overestimates the outcome for observation i.
What does it mean when the residual (ei) is larger than 0?
The model underestimates the outcome for observation i.
What is RSS?
The RSS is the Residual Sum of Squares. Σ ei2
How is the regression called if we estimate the parameters based on RSS?
Ordinary Least Squares regression (OLS)
What is the expected relationship between the residuals and the fitted values of a linear model?
There should be no relationship, if there would be it would mean that the true model is not a linear model at all.
What behaviour do we expect in a histogram of the residuals for a linear model?
The residuals should be symmetric around 0. (high residuals are less likely to occur than low residuals)
What is RMSE? and how does the formula look?
Root mean squared error.
What is RSE?
Residual standard error.
What is the formula for the Coefficient of Determination or R2?
What does the Coefficient of Determination or R2 tell us?
It explains how much of the total observed variability is acounted for/explained by the model.
Is the coefficient of determination (R2)the square of pearson’s correlated coefficient r ?
Only if we have one IV in our model.
Does R2 = 1 imply we have the true model? i.e. all estimated parameters of the model (ßj) are correct to the truth.
No, when R2 = 1 it means that the model accounts for all variability with the data set, however this does not mean we found the model that created the data. f.i. R2 = 1 can always be reached if the model is overfitted.
What is meant with two caveats?
In linear modelling, the residuals, and hence RSS and RMSE, are calculated vertically. If doen horizontally, the new estimated model will differ from the vertical one.
What are the 3 main reasons to add more predictors to a model?
- It reduces the RSS, and hence is more accurate.
- It accounts for factors other than the one of interest, and thus adding these other factors eliminates their effect on outcome Y.
- If the effect of X on Y is dependent on a third variable, we need to model it explicitly.