Week 9: Assumptions of Multivariable Linear Regression Flashcards

1
Q

What is the outcome of linear regression?

A

Outcome is always continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What types of variables can the explanatory variables be in linear regression?

A

Continuous or categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is a continuous explanatory variable interpreted?

A

As the explanatory variable increases by one unit, the outcome changes by the value of the coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is a categorical explanatory variable interpreted?

A

The outcome changes by the coefficient’s value for the category of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the assumptions of linear regression?

A
  1. Normality of residuals
  2. Linear relationship between outcome and explanatory variables
  3. Constant variance (SD) of the outcome over x
  4. Data independence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a residual in regression?

A

The difference between the predicted value and the observed value of the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can residuals be checked for normality?

A

By plotting kernel density (kdensity resid, normal) or using pnorm and qnorm plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What indicates heteroskedasticity in residual plots?

A

A fan or funnel shape in residuals vs. fitted values indicates unequal variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you check the linearity and equal variance assumptions in regression?

A

Plot residuals against fitted values; there should be no clear pattern or funnel shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What should you do if the linearity assumption is violated?

A

Consider adding a quadratic term or categorising the explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you very the independence assumption?

A

Ensure outcome data come from different individuals at one time point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What can you do if multiple assumptions are violated (e.g., non-normality, non-linearity, heteroskedasticity)?

A

Transform the outcome variable to address all issues, but interpretation becomes more complex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What transformations are commonly used for improving normality?

A

Logarithmic, square root, inverse, or power transformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a limitation of logarithmic transformation?

A

It cannot be used with variables that contain zero, unless a small constant (e.g., 0.1) is added

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do transformations affect regression analysis?

A

They change the scale of coefficients and standard errors, leading to different results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the interpretation of coefficients after a log transformation of the outcome variable?

A

Coefficients approximate percentage changes for a one-unit increase in the explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are outliers in regression analysis?

A

Extreme values of the outcome variable with large residuals (positive or negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is leverage in regression analysis?

A

Observations with extreme values of the explanatory variable that can influence regression coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can leverage points affect regression results?

A

They can pull the regression line toward them, distorting the fit

20
Q

What is collinearity in regression?

A

When explanatory variables are highly correlated, making it difficult to include both in the model

21
Q

What should you do if collinearity is present?

A

Choose only one correlated variable to include in the model

22
Q

Why is it better to keep a continuous outcome variable as it is?

A

It allows more explanatory variables to be included in the model and improves statistical power

23
Q

What is a rule of thumb for including explanatory variables?

A

One explanatory variable for every 10 observations; categorical variables count as the number of categories minus one (e.g., a variable with 4 categories counts as 3 variables)

24
Q

What are common methods for model building?

A

Include all variables, manual backwards selection, automated forward selection, backward selection, or stepwise selection

25
Q

What is manual backwards selection?

A

Start with all variables, remove the least significant, and refit the model until all remaining variables meet the significance threshold

26
Q

What are automated methods of model selection?

A

Backward selection, forward selection, and stepwise selection using predefined criteria for variable inclusion

27
Q

What are potential issues with automated model selection?

A

Different methods may lead to different final models, and some variable categories may be excluded

28
Q

What pattern should residuals follow in a residual vs fitted plot?

A

Residuals should sit evenly around 0 with no clear pattern

29
Q

What does a residual vs fitted plot reveal about non-linearity?

A

Residuals being positive for moderate fitted values and negative for small/large values indicates a curve in the relationship

30
Q

What happens to the assumptions as the dataset gets larger?

A

The assumptions become less critical

31
Q

What are limitations of dichotomising a continuous variable?

A

It reduces statistical power and limits the ability to include explanatory variables

32
Q

What is heteroskedasticity?

A

Unequal variance of residuals across values of the explanatory variable

33
Q

How do you identify outliers and leverage points?

A

Use scatter plots, stem and leaf plots, or examine residual and qnorm plots

34
Q

How can we get the residuals for each person in Stata?

A

<predict chosen_varname, resid>

35
Q

What’s the difference between pnorm and qnorm plots?

A
  • pnorm is sensitive to non-normality in the middle range of the data (where it is likely there will be a lot of data)
  • qnorm is sensitive to non-normality at the extremities of the data (where it is likely there will be less data). Non-normal residuals will show on this plot
36
Q

How do we check for normality on a residual vs fitted plot?

A

Residuals should sit evenly about 0
If they are densest elsewhere, this indicates that the residuals are not normal

37
Q

What should you always do when checking the linearity assumption?

A

Plot a scatter plot between outcome and explanatory variable(s) if continuous

38
Q

If we need to transform variables, what should be transformed first?

A

The outcome, then if necessary, the explanatory
If you are including a baseline value of an outcome as an explanatory, it makes sense for them to be on the same scale so transform them both

39
Q

On what kind of data do logarithmic transformations work best?

A

Right (positive) skewed data

40
Q

How can we address outliers and leverage points?

A
  1. First check the data source and whether data have been entered correctly
  2. If data are entered correctly, do not exclude data without good reason. You may want to analyse the data with and without the outlier(s) and/or leverage points to see how the models compare
41
Q

How do you compile an analysis plan?

A
  1. Define research question(s) - check data you have will be able to answer your questions. Define H0 and H1
  2. Check data - see how variables are distributed (histograms, check for outliers, tabulations)
  3. Do analyses by an important categorical variable (if appropriate)
  4. Consider whether you need to do regression analysis - decide which variables to include and check for multicollinearity
  5. Model building
  6. Make sure assumptions for regression analysis have been met
42
Q

What analytical methods can be used for two categorical variables?

A

Chi-square if assumptions met; Fischer’s exact test if assumptions not met

43
Q

What analytical methods can be used for a categorical and a continuous variable?

A
  • Two samples t-test if two categories and assumptions met
  • One way ANOVA is > 2 categories and assumptions met
  • Non-parametric equivalents if assumptions not met
44
Q

What is forwards selection?

A

Starting with no variables in the model and adding variables one by one using a pre-specified criteria

45
Q

what is stepwise selection?

A

using a combination of forwards and backwards selection with pre-specified criteria for including and excluding variables