Week 9: Assumptions of Multivariable Linear Regression Flashcards

Question 1

Q

How do you check the distribution of continuous variables?

Answer

A

<hist varname, freq normal> to create a histogram and overlay a normal distribution

Question 2

Q

How do we call a coefficient when the dependent variable decreases as the independent variable increases?

Answer

A

Negative coefficient

Question 3

Q

What does the coefficient represent?

Answer

A

The change in the dependent variable for a one-unit change in the predictor, holding other variables constant

Question 4

Q

What is the significance of residuals in regression analysis?

Answer

A

Residuals measure the difference between observed and predicted values, indicating model fit

Question 5

Q

How can you test if residuals are normally distributed using a kernel density plot?

Answer

A

<kdensity resid_varname, normal> and overlay a normal curve to check for alignment

Question 6

Q

How does a pnorm plot help in assessing normality of residuals?

Answer

A

It compares the cumulative distribution of residuals to a normal distribution; closer alignment suggests normality. Qnorm plots show deviations from normality in the middle range of data

Question 7

Q

What does deviation from the line in a qnorm plot represent?

Answer

A

Deviation at the tails indicates non-normality, suggesting potential outliers or skewness. Shows deviations from normality at the extremities.

Question 8

Q

Why is normality of residuals important in linear regression?

Answer

A

Normal residuals ensure valid hypothesis testing and confidence intervals

Question 9

Q

What is the purpose of a residual vs fitted plot?

Answer

A

It checks for patterns that indicate violations of linearity, equal variance, or non-normality

Question 10

Q

How can you assess if the linearity and equal variance assumptions are met?

Answer

A

Look for random scatter in a residual vs fitted plot; fanning or patterns suggest heteroscedasticity
In a residual vs fitted plot, non-linearity is shown by a pattern, whereas unequal variance is shown by a funnel shape

Question 11

Q

What does the presence of leverage points indicate?

Answer

A

Leverage points are influential observations that can disproportionately affect model fit

Question 12

Q

How can you identify leverage points in regression analysis?

Answer

A

Plot residuals or fitted values against predictors and look for isolated points

Question 13

Q

What does multicollinearity indicate in a regression model?

Answer

A

High correlation between predictors can distort coefficients, making them unreliable

Question 14

Q

How can multicollinearity be detected?

Answer

A

Calculate correlation coefficients between predictors; values near +/- 1 indicate multicollinearity. Use command <cor></cor>

Question 15

Q

Why might adding two highly correlated predictors distort regression results?

Answer

A

The shared variance between predictors reduces the model’s ability to isolate individual effects

Question 16

Q

What does recoding or transforming a variable achieve in regression analysis?

Answer

A

It can improve the fit by correcting for skewness, non-linearity, or address data inaccuracies (e.g., age grouping)

Question 17

Q

Why is it important to check for missing data after transformations?

Answer

A

Transformations can exclude cases, reducing sample size and potentially biasing results

Question 18

Q

How can you improve the normality of residuals?

Answer

A

Apply transformations (e.g., log, square root) to the dependent variable to reduce skewness

Question 19

Q

What does a funnel shape in a residual plot suggest?

Answer

A

It indicates heteroscedasticity - variance of residuals increases with fitted values

Question 20

Q

What is the implication of non-linearity in residual plots?

Answer

A

Non-linearity suggests that the relationship between predictors and outcome may not be adequately captured by the model

Question 21

Q

Why might adding interaction terms improve model fit?

Answer

A

Interactions account for cases where the effect of one predictor depends on the level of another predictor

Question 22

Q

What does a constant (_cons) represent?

Answer

A

It is the predicted value of the dependent variable when all predictors are zero

Question 23

Q

How can transformations improve model assumptions?

Answer

A

They can stabilise variance, reduce skewness, and make relationships more linear

Question 24

Q

What does a high peak in a histogram indicate about the distribution?

Answer

A

It may suggest a large concentration of data around a specific value, potentially indicating skewness or rounding

Question 25

Q

Why is assessing the distribution of predictors and outcomes crucial in regression?

Answer

A

Non-normality or outliers can lead to biased estimates and affect the validity of the model

Question 26

Q

How does excluding extreme observations affect model fit?

Answer

A

It can reduce leverage effects and improve stability, but may also remove meaningful data

Question 27

Q

What is the purpose of fitting multiple models with different predictors?

Answer

A

It helps to identify the best combination of variables and assess the robustness of results

Question 28

Q

Why is it useful to visualise residuals after each regression model?

Answer

A

It allows for continuous assessment of model assumptions and fit

Question 29

Q

How do you generate residuals for a regression model?

Answer

A

<predict new_varname, resid>
This will generate a new variable. This cannot be overwritten by any other residuals variable created - subsequent variables must have a new name

Question 30

Q

How do you plot a pnorm and qnorm plot to examine the normality of residuals?

Answer

A

Question 31

Q

What command do you use to check normality of residuals using a residual vs fitted value plot?

Answer

A

<rvfplot, yline(0) msymbol (Oh) msize(tiny)>
The <yline> option draws a line at 0 where the residuals should be densest
The <msymbol> option is used to change the default symbol which is now set at a hollow circle</msymbol></yline>

<msize> changes the size of the symbol
</msize>

Question 32

Q

How can we make a variable containing the fitted values and why would we want to do this?

Answer

A

<predict fit_varname if e(sample)>
The fitted values can be plotted on a scatterplot to see where there may be issues with fitted values (may be far away from the others) -
<scatter fit_varname varname condition, msymbol (Oh) msize(tiny)> Or
<twoway (scatter fit_varname varname condition, sort) (scatter fit_varname varname condition, sort)>

Question 33

Q

How can you check the options for transforming a variable?

Answer

A

Question 34

Q

How do you perform a log transformation on a variable?

Answer

A