Week 9: Assumptions of Multivariable Linear Regression Flashcards by Amelia Jasinski

What is the outcome of linear regression?

Outcome is always continuous

How well did you know this?

Not at all

Perfectly

What types of variables can the explanatory variables be in linear regression?

Continuous or categorical

How well did you know this?

Not at all

Perfectly

How is a continuous explanatory variable interpreted?

As the explanatory variable increases by one unit, the outcome changes by the value of the coefficient

How well did you know this?

Not at all

Perfectly

How is a categorical explanatory variable interpreted?

The outcome changes by the coefficient’s value for the category of interest

How well did you know this?

Not at all

Perfectly

What are the assumptions of linear regression?

Normality of residuals
Linear relationship between outcome and explanatory variables
Constant variance (SD) of the outcome over x
Data independence

How well did you know this?

Not at all

Perfectly

What is a residual in regression?

The difference between the predicted value and the observed value of the outcome

How well did you know this?

Not at all

Perfectly

How can residuals be checked for normality?

By plotting kernel density (kdensity resid, normal) or using pnorm and qnorm plots

How well did you know this?

Not at all

Perfectly

What indicates heteroskedasticity in residual plots?

A fan or funnel shape in residuals vs. fitted values indicates unequal variance

How well did you know this?

Not at all

Perfectly

How do you check the linearity and equal variance assumptions in regression?

Plot residuals against fitted values; there should be no clear pattern or funnel shape

How well did you know this?

Not at all

Perfectly

What should you do if the linearity assumption is violated?

Consider adding a quadratic term or categorising the explanatory variable

How well did you know this?

Not at all

Perfectly

How can you very the independence assumption?

Ensure outcome data come from different individuals at one time point

How well did you know this?

Not at all

Perfectly

What can you do if multiple assumptions are violated (e.g., non-normality, non-linearity, heteroskedasticity)?

Transform the outcome variable to address all issues, but interpretation becomes more complex

How well did you know this?

Not at all

Perfectly

What transformations are commonly used for improving normality?

Logarithmic, square root, inverse, or power transformations

How well did you know this?

Not at all

Perfectly

What is a limitation of logarithmic transformation?

It cannot be used with variables that contain zero, unless a small constant (e.g., 0.1) is added

How well did you know this?

Not at all

Perfectly

How do transformations affect regression analysis?

They change the scale of coefficients and standard errors, leading to different results

How well did you know this?

Not at all

Perfectly

What is the interpretation of coefficients after a log transformation of the outcome variable?

Coefficients approximate percentage changes for a one-unit increase in the explanatory variable

How well did you know this?

Not at all

Perfectly

What are outliers in regression analysis?

Extreme values of the outcome variable with large residuals (positive or negative)

How well did you know this?

Not at all

Perfectly

What is leverage in regression analysis?

Observations with extreme values of the explanatory variable that can influence regression coefficients

How well did you know this?

Not at all

Perfectly

How can leverage points affect regression results?

Study These Flashcards

They can pull the regression line toward them, distorting the fit

What is collinearity in regression?

Study These Flashcards

When explanatory variables are highly correlated, making it difficult to include both in the model

What should you do if collinearity is present?

Study These Flashcards

Choose only one correlated variable to include in the model

Why is it better to keep a continuous outcome variable as it is?

Study These Flashcards

It allows more explanatory variables to be included in the model and improves statistical power

What is a rule of thumb for including explanatory variables?

Study These Flashcards

One explanatory variable for every 10 observations; categorical variables count as the number of categories minus one (e.g., a variable with 4 categories counts as 3 variables)

What are common methods for model building?

Study These Flashcards

Include all variables, manual backwards selection, automated forward selection, backward selection, or stepwise selection

What is manual backwards selection?

Start with all variables, remove the least significant, and refit the model until all remaining variables meet the significance threshold

What are automated methods of model selection?

Backward selection, forward selection, and stepwise selection using predefined criteria for variable inclusion

What are potential issues with automated model selection?

Different methods may lead to different final models, and some variable categories may be excluded

What pattern should residuals follow in a residual vs fitted plot?

Residuals should sit evenly around 0 with no clear pattern

What does a residual vs fitted plot reveal about non-linearity?

Residuals being positive for moderate fitted values and negative for small/large values indicates a curve in the relationship

What happens to the assumptions as the dataset gets larger?

The assumptions become less critical

What are limitations of dichotomising a continuous variable?

It reduces statistical power and limits the ability to include explanatory variables

What is heteroskedasticity?

Unequal variance of residuals across values of the explanatory variable

How do you identify outliers and leverage points?

Use scatter plots, stem and leaf plots, or examine residual and qnorm plots

How can we get the residuals for each person in Stata?

What's the difference between pnorm and qnorm plots?

- pnorm is sensitive to non-normality in the middle range of the data (where it is likely there will be a lot of data) - qnorm is sensitive to non-normality at the extremities of the data (where it is likely there will be less data). Non-normal residuals will show on this plot

How do we check for normality on a residual vs fitted plot?

Residuals should sit evenly about 0 If they are densest elsewhere, this indicates that the residuals are not normal

What should you always do when checking the linearity assumption?

Plot a scatter plot between outcome and explanatory variable(s) if continuous

If we need to transform variables, what should be transformed first?

The outcome, then if necessary, the explanatory If you are including a baseline value of an outcome as an explanatory, it makes sense for them to be on the same scale so transform them both

On what kind of data do logarithmic transformations work best?

Right (positive) skewed data

How can we address outliers and leverage points?

1. First check the data source and whether data have been entered correctly 2. If data are entered correctly, do not exclude data without good reason. You may want to analyse the data with and without the outlier(s) and/or leverage points to see how the models compare

How do you compile an analysis plan?

1. Define research question(s) - check data you have will be able to answer your questions. Define H0 and H1 2. Check data - see how variables are distributed (histograms, check for outliers, tabulations) 3. Do analyses by an important categorical variable (if appropriate) 4. Consider whether you need to do regression analysis - decide which variables to include and check for multicollinearity 5. Model building 6. Make sure assumptions for regression analysis have been met

What analytical methods can be used for two categorical variables?

Chi-square if assumptions met; Fischer's exact test if assumptions not met

What analytical methods can be used for a categorical and a continuous variable?

- Two samples t-test if two categories and assumptions met - One way ANOVA is > 2 categories and assumptions met - Non-parametric equivalents if assumptions not met

What is forwards selection?

Starting with no variables in the model and adding variables one by one using a pre-specified criteria

what is stepwise selection?

using a combination of forwards and backwards selection with pre-specified criteria for including and excluding variables

Week 9: Assumptions of Multivariable Linear Regression Flashcards

(45 cards)