Week 9: Assumptions of Multivariable Linear Regression Flashcards
What is the outcome of linear regression?
Outcome is always continuous
What types of variables can the explanatory variables be in linear regression?
Continuous or categorical
How is a continuous explanatory variable interpreted?
As the explanatory variable increases by one unit, the outcome changes by the value of the coefficient
How is a categorical explanatory variable interpreted?
The outcome changes by the coefficient’s value for the category of interest
What are the assumptions of linear regression?
- Normality of residuals
- Linear relationship between outcome and explanatory variables
- Constant variance (SD) of the outcome over x
- Data independence
What is a residual in regression?
The difference between the predicted value and the observed value of the outcome
How can residuals be checked for normality?
By plotting kernel density (kdensity resid, normal) or using pnorm and qnorm plots
What indicates heteroskedasticity in residual plots?
A fan or funnel shape in residuals vs. fitted values indicates unequal variance
How do you check the linearity and equal variance assumptions in regression?
Plot residuals against fitted values; there should be no clear pattern or funnel shape
What should you do if the linearity assumption is violated?
Consider adding a quadratic term or categorising the explanatory variable
How can you very the independence assumption?
Ensure outcome data come from different individuals at one time point
What can you do if multiple assumptions are violated (e.g., non-normality, non-linearity, heteroskedasticity)?
Transform the outcome variable to address all issues, but interpretation becomes more complex
What transformations are commonly used for improving normality?
Logarithmic, square root, inverse, or power transformations
What is a limitation of logarithmic transformation?
It cannot be used with variables that contain zero, unless a small constant (e.g., 0.1) is added
How do transformations affect regression analysis?
They change the scale of coefficients and standard errors, leading to different results
What is the interpretation of coefficients after a log transformation of the outcome variable?
Coefficients approximate percentage changes for a one-unit increase in the explanatory variable
What are outliers in regression analysis?
Extreme values of the outcome variable with large residuals (positive or negative)
What is leverage in regression analysis?
Observations with extreme values of the explanatory variable that can influence regression coefficients
How can leverage points affect regression results?
They can pull the regression line toward them, distorting the fit
What is collinearity in regression?
When explanatory variables are highly correlated, making it difficult to include both in the model
What should you do if collinearity is present?
Choose only one correlated variable to include in the model
Why is it better to keep a continuous outcome variable as it is?
It allows more explanatory variables to be included in the model and improves statistical power
What is a rule of thumb for including explanatory variables?
One explanatory variable for every 10 observations; categorical variables count as the number of categories minus one (e.g., a variable with 4 categories counts as 3 variables)
What are common methods for model building?
Include all variables, manual backwards selection, automated forward selection, backward selection, or stepwise selection