Week 9: Assumptions of Multivariable Linear Regression Flashcards
What is the outcome of linear regression?
Outcome is always continuous
What types of variables can the explanatory variables be in linear regression?
Continuous or categorical
How is a continuous explanatory variable interpreted?
As the explanatory variable increases by one unit, the outcome changes by the value of the coefficient
How is a categorical explanatory variable interpreted?
The outcome changes by the coefficient’s value for the category of interest
What are the assumptions of linear regression?
- Normality of residuals
- Linear relationship between outcome and explanatory variables
- Constant variance (SD) of the outcome over x
- Data independence
What is a residual in regression?
The difference between the predicted value and the observed value of the outcome
How can residuals be checked for normality?
By plotting kernel density (kdensity resid, normal) or using pnorm and qnorm plots
What indicates heteroskedasticity in residual plots?
A fan or funnel shape in residuals vs. fitted values indicates unequal variance
How do you check the linearity and equal variance assumptions in regression?
Plot residuals against fitted values; there should be no clear pattern or funnel shape
What should you do if the linearity assumption is violated?
Consider adding a quadratic term or categorising the explanatory variable
How can you very the independence assumption?
Ensure outcome data come from different individuals at one time point
What can you do if multiple assumptions are violated (e.g., non-normality, non-linearity, heteroskedasticity)?
Transform the outcome variable to address all issues, but interpretation becomes more complex
What transformations are commonly used for improving normality?
Logarithmic, square root, inverse, or power transformations
What is a limitation of logarithmic transformation?
It cannot be used with variables that contain zero, unless a small constant (e.g., 0.1) is added
How do transformations affect regression analysis?
They change the scale of coefficients and standard errors, leading to different results
What is the interpretation of coefficients after a log transformation of the outcome variable?
Coefficients approximate percentage changes for a one-unit increase in the explanatory variable
What are outliers in regression analysis?
Extreme values of the outcome variable with large residuals (positive or negative)
What is leverage in regression analysis?
Observations with extreme values of the explanatory variable that can influence regression coefficients