Week 9: Assumptions of Multivariable Linear Regression Flashcards

1
Q

How do you check the distribution of continuous variables?

A

<hist varname, freq normal> to create a histogram and overlay a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we call a coefficient when the dependent variable decreases as the independent variable increases?

A

Negative coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the coefficient represent?

A

The change in the dependent variable for a one-unit change in the predictor, holding other variables constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the significance of residuals in regression analysis?

A

Residuals measure the difference between observed and predicted values, indicating model fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you test if residuals are normally distributed using a kernel density plot?

A

<kdensity resid_varname, normal> and overlay a normal curve to check for alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does a pnorm plot help in assessing normality of residuals?

A

It compares the cumulative distribution of residuals to a normal distribution; closer alignment suggests normality. Qnorm plots show deviations from normality in the middle range of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does deviation from the line in a qnorm plot represent?

A

Deviation at the tails indicates non-normality, suggesting potential outliers or skewness. Shows deviations from normality at the extremities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is normality of residuals important in linear regression?

A

Normal residuals ensure valid hypothesis testing and confidence intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of a residual vs fitted plot?

A

It checks for patterns that indicate violations of linearity, equal variance, or non-normality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you assess if the linearity and equal variance assumptions are met?

A

Look for random scatter in a residual vs fitted plot; fanning or patterns suggest heteroscedasticity
In a residual vs fitted plot, non-linearity is shown by a pattern, whereas unequal variance is shown by a funnel shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the presence of leverage points indicate?

A

Leverage points are influential observations that can disproportionately affect model fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you identify leverage points in regression analysis?

A

Plot residuals or fitted values against predictors and look for isolated points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does multicollinearity indicate in a regression model?

A

High correlation between predictors can distort coefficients, making them unreliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can multicollinearity be detected?

A

Calculate correlation coefficients between predictors; values near +/- 1 indicate multicollinearity. Use command <cor></cor>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why might adding two highly correlated predictors distort regression results?

A

The shared variance between predictors reduces the model’s ability to isolate individual effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does recoding or transforming a variable achieve in regression analysis?

A

It can improve the fit by correcting for skewness, non-linearity, or address data inaccuracies (e.g., age grouping)

17
Q

Why is it important to check for missing data after transformations?

A

Transformations can exclude cases, reducing sample size and potentially biasing results

18
Q

How can you improve the normality of residuals?

A

Apply transformations (e.g., log, square root) to the dependent variable to reduce skewness

19
Q

What does a funnel shape in a residual plot suggest?

A

It indicates heteroscedasticity - variance of residuals increases with fitted values

20
Q

What is the implication of non-linearity in residual plots?

A

Non-linearity suggests that the relationship between predictors and outcome may not be adequately captured by the model

21
Q

Why might adding interaction terms improve model fit?

A

Interactions account for cases where the effect of one predictor depends on the level of another predictor

22
Q

What does a constant (_cons) represent?

A

It is the predicted value of the dependent variable when all predictors are zero

23
Q

How can transformations improve model assumptions?

A

They can stabilise variance, reduce skewness, and make relationships more linear

24
Q

What does a high peak in a histogram indicate about the distribution?

A

It may suggest a large concentration of data around a specific value, potentially indicating skewness or rounding

25
Q

Why is assessing the distribution of predictors and outcomes crucial in regression?

A

Non-normality or outliers can lead to biased estimates and affect the validity of the model

26
Q

How does excluding extreme observations affect model fit?

A

It can reduce leverage effects and improve stability, but may also remove meaningful data

27
Q

What is the purpose of fitting multiple models with different predictors?

A

It helps to identify the best combination of variables and assess the robustness of results

28
Q

Why is it useful to visualise residuals after each regression model?

A

It allows for continuous assessment of model assumptions and fit

29
Q

How do you generate residuals for a regression model?

A

<predict new_varname, resid>
This will generate a new variable. This cannot be overwritten by any other residuals variable created - subsequent variables must have a new name

30
Q

How do you plot a pnorm and qnorm plot to examine the normality of residuals?

A

<pnorm>
<qnorm>
</qnorm></pnorm>

31
Q

What command do you use to check normality of residuals using a residual vs fitted value plot?

A

<rvfplot, yline(0) msymbol (Oh) msize(tiny)>
The <yline> option draws a line at 0 where the residuals should be densest
The <msymbol> option is used to change the default symbol which is now set at a hollow circle</msymbol></yline>

<msize> changes the size of the symbol
</msize>

32
Q

How can we make a variable containing the fitted values and why would we want to do this?

A

<predict fit_varname if e(sample)>
The fitted values can be plotted on a scatterplot to see where there may be issues with fitted values (may be far away from the others) -
<scatter fit_varname varname condition, msymbol (Oh) msize(tiny)> Or
<twoway (scatter fit_varname varname condition, sort) (scatter fit_varname varname condition, sort)>

33
Q

How can you check the options for transforming a variable?

A

<gladder>
</gladder>

33
Q

How do you perform a log transformation on a variable?

A

<gen log_varname=log(varname)>