Week 9: Assumptions of Multivariable Linear Regression Flashcards
How do you check the distribution of continuous variables?
<hist varname, freq normal> to create a histogram and overlay a normal distribution
How do we call a coefficient when the dependent variable decreases as the independent variable increases?
Negative coefficient
What does the coefficient represent?
The change in the dependent variable for a one-unit change in the predictor, holding other variables constant
What is the significance of residuals in regression analysis?
Residuals measure the difference between observed and predicted values, indicating model fit
How can you test if residuals are normally distributed using a kernel density plot?
<kdensity resid_varname, normal> and overlay a normal curve to check for alignment
How does a pnorm plot help in assessing normality of residuals?
It compares the cumulative distribution of residuals to a normal distribution; closer alignment suggests normality. Qnorm plots show deviations from normality in the middle range of data
What does deviation from the line in a qnorm plot represent?
Deviation at the tails indicates non-normality, suggesting potential outliers or skewness. Shows deviations from normality at the extremities.
Why is normality of residuals important in linear regression?
Normal residuals ensure valid hypothesis testing and confidence intervals
What is the purpose of a residual vs fitted plot?
It checks for patterns that indicate violations of linearity, equal variance, or non-normality
How can you assess if the linearity and equal variance assumptions are met?
Look for random scatter in a residual vs fitted plot; fanning or patterns suggest heteroscedasticity
In a residual vs fitted plot, non-linearity is shown by a pattern, whereas unequal variance is shown by a funnel shape
What does the presence of leverage points indicate?
Leverage points are influential observations that can disproportionately affect model fit
How can you identify leverage points in regression analysis?
Plot residuals or fitted values against predictors and look for isolated points
What does multicollinearity indicate in a regression model?
High correlation between predictors can distort coefficients, making them unreliable
How can multicollinearity be detected?
Calculate correlation coefficients between predictors; values near +/- 1 indicate multicollinearity. Use command <cor></cor>
Why might adding two highly correlated predictors distort regression results?
The shared variance between predictors reduces the model’s ability to isolate individual effects