Multiple Linear Regression Flashcards
Write the regression equation.
y = b0 + b1 x + e
Why will the estimate value of y not be perfect?
Because there are residuals or errors, which is what ‘e’ stands for.
What is a regression line used for?
To make predictions about the value of the dependent variable based on a value of the predictor.
What happens to R^2 when more variables are added?
R^2 always increases even when the new variables have no predictive power.
How do we know if two models are nested?
If one model contains all the terms of the other, and at least one additional term.
What is b(0)?
The value of y variable when x variable=0.
What is b(1)?
This is the amount of change in variable y for one unit change of variable x.
Write the multiple regression equation.
y = b(0) + b(1)x(1) + b(2)x(2)… + b(n)x(n) + e
What is a fully specified model?
A model in which we have accounted for all factors that determine variation in the dependent variable (y).
Why can’t we usually have a fully specified model?
We cannot measure all the factors that affect y
What is the relationship between the t and p values?
As the t value increases, the p value increases
What is the t value for significance at 0.05 level of confidence?
+/- 1.96
What does rejecting the null hypothesis mean?
- Our relationship is not likely to have occurred by chance
- Our relationship is likely to be reflected in the population
What do we do when we want to add a categorical variable such as sex to the model?
We create a variable that takes the values “0” and “1” for men and women, respectively.
What do we do if there are multiple categories?
We create multiple dummy variables where one category is a reference and left out of the model.
What does R^2 mean?
How much variability in the dependent variable is explained for, e.g., an R-squared value of 0.66 means that 66% of the variance in the y variable is
explained by the x variables.
How do we interpret the p value?
If the p value is less than the alpha value of 0.05 then we know that our model has at least one significant independent variable.
How does the forward stepwise selection method work?
- Begins with no variables and introduces variables one by one
- Add variables that increase R2 the most
- Continue this procedure until none of the remaining variables explain a significant amount of the additional variability in y
How does the backwards stepwise selection method work?
- Starts will all variables in the model
- Drops variables that contribute least to R2
- Process continues until remaining variables explain a significant proportion of variability in y
How do we compare coefficients that are measured in different units?
We standardise the coefficients into beta coefficients.
How do we interpret beta coefficients?
As “a one standard deviation unit increase in x leads to a ___standard deviation unit increase/decrease in y.”
This way, we can compare continuous IVs to see which has the largest association.
Why should we not standardise categorical variables?
Because a 0/1 dummy variable cannot be increased by one standard deviation
What does the relaimpo package do?
Provides measures of relative importance for each of the predictors in the model by entering regression variables in all possible orders, and then averaging the changes in the R2.
What does adjusted R^2 do?
The adjusted R2 controls for the number of variables we have included in our model, so it avoids the problem of R^2 increasing when more variables are added.
How do we control for multicollinearity through plots?
- Observe the correlation between the predictors in the model by plotting them against each other.
- If correlation between any two predictors is strong, then one of them needs to be removed from the model
What is Variance Inflation Factor (VIF)?
VIF larger than 5 or 10 indicates serious problems with collinearity.
Why is multicollinearity a problem?
- If a predictor is strongly related to some other input, then we are simply adding redundant information to the model
- It can be difficult to separate the effects of the multicollinear predictors