Linear Regression Flashcards
interpret constant & coeffcient
constant = when education equals 0, income is 457
coeffcient = with every year of education the mean income increases by 104
What is a prediction error?
Lin regression assumptions:
- Linear relationship
- Multivariate normality (all variables need to be normally distributed –> When the data is not normally distributed a non-linear transformation e.g., log-transformation might fix this issue)
- No or little multicollinearity (If multicollinearity is found in the data, centering the data (that is deducting the mean of the variable from each score) might help to solve the problem. However, the simplest way to address the problem is to remove independent variables with high VIF values)
- No auto-correlation (Autocorrelation occurs when the residuals are not independent from each other. For instance, this typically occurs in stock prices, where the price is not independent from the previous price)
- Homoscedasticity (The scatter plot is good way to check whether the data are homoscedastic (meaning the residuals are equal across the regression line).
How does OLS work?
What is r^2?
How is R^2 calculated?
What is the loss function for linear regression?
What does a lin regression model predict?
The mean value of y for a given value of x
(No probabilities, it’s a model of the mean)
How can we make a constant more meaningful? (1)
centering: usually mean centered (subtract -12,5 years from years of education)
How can we make a constant more meaningful? (2)
standardizing: subtract mean / by SD
1) For every 1-SD of education, mean of income rises by 402
2) For every 1-SD of education, mean of income rises by 0,3 SD of mean income
Why would you want to standardize?
Allows comparison
How are the standardized coeffcients also called?
What is true about correlations?
1) Standardizing gets rid of scale –> whole point
3) Perfect correlation = 0 error
4) just not linear -> just a measure for linear relationships!
Why would we even need a regression, why not only calculate the conditional means?
1) reduce noise -> virtue of abstraction
2) prediction even for data that is not there
3) allows for more control i.e. mediation, moderation, controls, etc.
Why do we square residuals in r^2?
1) prevent cancelling out
2) bigger penalty for large residuals
- no threshold will do
- similar rationale with alpha value
- highly noisy data in SS how could we possibly achieve it? or should even want to?
Is my R² too low?
Low R-Squared is often good BUT also a limitation
Is my R² too high?
High R-Squared is often not good
BUT can be
Why a Low R-Squared is often good