Multiple Linear Regression in R Flashcards
Review of Linear Regression and Multiple Linear Regression and R results interpretation
What does multiple variable aggression do?
It allows for multiple exposure/risk factors while controlling (adjusting) for confounding
- ) Write out linear regression equation
- ) Is caloric intake significantly associated with BMI?
- ) Interpret the slope
- ) Interpret R2
- ) Predict BMI for someone who consumes 2,500 calories/day
- ) Y(BMI) = 13.8863 + 2.6711(x)
- ) Yes. p< 0.05
- ) For every 1000 unit increase in calories ( or 1 unit increase in kcal), bmi increases by by 2.6711 kg/m2
- ) Caloric intake explains 66.85% of the variability in bmi
- ) BMI = 13.8863 + 2.6711(2.5kcal) = 20.57
How do you calculate the confidence interval of the slope? How do you interpret it?
95% CI = b + tcrit*SE, (df = n-2)
where
b = the slope from your R results (2.6711)
SE = standard error from your R results (0.1802)
tcrit is for n-2 degrees of freedom (df = 109-2 = 107 from R results)
(Could also use Zcrit = 1.960 because more than 30 observations)
Interpretation of the 95% CI:
The estimated slope is 2.67
We are 95% confident that the true slope is between 2.31 and 3.03
Remember: If the 95% confidence interval for the slope does not contain the null value (0), then the relationship is statistically significant at a=0.05
Multiple Linear Regression Equation
Y = a + b1X1 + b2X2 + b3X3 . . . bnXn where Y = continuous outcome a = the intercept the rest = now multiple independent (predictor) variables, so many slopes
In what cases is it appropriate to use (simple and multiple) linear regression?
Linear regression is mainly used when the outcome (dependent variable) is a continuous variable
What kind of independent variables (predictors) can you have in linear regression?
continuous, dichotomous (yes/no), and ordinal/categorical (convert to many dummy variables)
- ) Write out the estimated multiple regression equation
- ) Which predictors are significantly associated with BMI?
- ) Interpret each of the regression coefficients.
- ) Are age and activity confounding the association between caloric intake and BMI? (Unadjusted slope = 2.67)
- ) BMI = 15.47 + 2.00(kcalx1000) + 0.11(age) + -1.34(activity)
- ) All three (the starred ones)
- ) The regression coefficients are the slopes. So, for every 1 unit increase in kcal, BMI increases by 2.00kcal/m2 after adjusting for other predictors, for every year aged, BMI increases by .11 kg/m2 after adjusting for other predictors, for those that are physically active, BMI is lower on average by -1.34 kg/m2 than those that are not active after adjusting for all other predictors.
- ) Unadjusted slope = 2.67, adjusted slope = 2.00. You have to apply the 10% rule: (2.67-2.00)/(2.00) = 32.8%. Yes, the effect of caloric intake changes by more than 10% (32.8%) so it does appear that these variables jointly confound the effect.
What is a simple model in regression analysis? What are the results called?
Linear regression model with one X variable and the results are often called unadjusted or crude results.
What is a linear regression model with more than one X variable called? What are it’s results called?
Multivariable regression. Results sometimes referred to as adjusted results.
What is multiple linear regression used for?
To evaluate predictors for a continuously distributed outcome variable.
What three things does variable regression enable you to do?
Control for confounding: each of the coefficients for the independent variables is adjusted for confounding by all other variables in the model.
Make predictions: Predicted values from the model can be interpreted either as estimated means (for subjects with a particular profile) or as predictions for individuals.
Identify relative importance of the independent variables in the model outcome.
What are the steps to interpreting variable regression analysis?
- ) determine whether the overall p-value indicates that this particular set of predictors are significantly associated with the outcome.
- ) If so, one can evaluate the relative impact of the individual variables (which are adjusted for the other variables) based on slope, CI for the slope, and p-value for the slope.
- ) What is the multiple R-squared? How much variability does the model explain?
What is the R-squared value?
R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. 0% indicates that the model explains none of the variability of the response data around its mean