Week 6: Linear Regression Flashcards
What is the purpose of linear regression?
To describe the relationship between a continuous outcome and one or more predictors
What does the slope (β1) in a regression equation represent?
The change in the outcome variable (y) for each unit increase in the predictor variable (x)
What does the intercept (β0) in a regression equation represent?
The value of the outcome variable (y) when the predictor variable (x) is zero
Write the general equation for simple linear regression
y = β0 + β1x
The equation of the straight line that best describes how the outcome (y) increases/decreases with the exposure (x)
How are regression parameters (β0, β1) estimated?
The line is fitted with the shortest distance between points and line. Distances are called residuals
By minimising the sum of squared residuals, the differences between observed and predicted values
What is H0 and H1 in testing the slope (β1)?
β1 = 0 (no relationship between x and y)
How is the t-statistic for the slope calculated?
t = β1 / SE of β1 - where SE is the standard error of the slope
What does a significant p-value for β1 indicate?
Evidence against H0, suggesting an association between x and y
What is a 95% CI for β1?
Small sample sizes: β1 +- t* x SE, where t* is the critical value from the t-distribution
Large sample sizes: typically use 5% point of normal distribution (1.96) instead of t-distribution
The CI details the values between which the slope of the line could lie
If a 95% CI for β1 does not include 0, what does it mean?
It indicates a statistically significant relationship between x and y
How do you predict y for a given x?
Use the regression equation 𝑦^ = β0 + β1x
Why should predictions not extrapolate beyond the range of x?
The relationship may not be linear outside the observed data range
Why might β0 (intercept) not always be meaningful?
If x = 0 is outside the observed data range, β0 may not provide useful information
How are binary categorical variables incorporated into regression?
By coding them as 0 and 1, representing the two categories
What does β1 represent in a regression with a binary predictor?
The difference in the outcome between the two categories
What is the purpose of multiple linear regression?
To examine the effect of multiple predictors on the outcome simultaneously
How do you write the equation for multiple regression with two predictors?
y = β0 + β1x1 + β2x2, where x1 and x2 are predictors
What is the adjusted R2 in a regression?
A measure of model fit that adjusts for the number of predictors
What is centring a predictor variable?
Subtracting the mean of x from each value to create a new variable x_cent = x - x̄
The intercept would then be at the mean outcome
Why centre a predictor variable?
To make the intercept (β0) interpretable as the value of y at the mean of x
E.g., it is not meaningful to predict y based on a body weight of 0
Name key assumptions of linear regression
- Linearity of relationship
- Homoscedasticity (constant variance of residuals)
- Normality of residuals
- Independence of observations
On a scatter plot, which variable goes on which axis?
Outcome on y-axis, exposure on x-axis
What do the different regression lines represent?
- = no association
/ = positive association
\ = negative association
What does _cons represent in a regression output?
β0 (intercept)
How would you calculate y if the coefficient for β0 (intercept) is 0.0857 and the coefficient for β1 (slope) is 0.0436? Assume x = 60
What would you conclude about the association between x and y?
y = 0.0857 + 0.0436 x 60 = 2.70
Per one unit increase in x results in a 0.0436 litre increase in y
What are β0 and β1 subject to?
Sampling variation (β0 and β1 are estimates of population values of the intercepts and slopes)
How is precision measured?
Standard errors (CIs)
Outline the logic behind fitting a linear regression model if x is a categorical variable using the example of lung function (FEV) with sex (binary) as predictor:
- FEV1 = β0 + β1(sex)
- Ascertain values for each category (females = 0, males = 1)
- Estimate mean FEV by sex
- If sex = 0, FEV1 = β0 + β1 x 0
- If sex = 1, FEV1 = β0 + β1 x 1
- Fit regression model
- Coefficient for _cons = mean FEV for girls = 1.54
- For boys, mean FEV1 = 1.54 + 0.12 x 1 = 1.66
- Boys have a larger mean FEV than girls (see mean difference and associated p value)
Note: We would get the same results if we did an unpaired t-test