Week 6: Linear Regression Flashcards

1
Q

What is the purpose of linear regression?

A

To describe the relationship between a continuous outcome and one or more predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the slope (β1) in a regression equation represent?

A

The change in the outcome variable (y) for each unit increase in the predictor variable (x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the intercept (β0) in a regression equation represent?

A

The value of the outcome variable (y) when the predictor variable (x) is zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Write the general equation for simple linear regression

A

y = β0 + β1x
The equation of the straight line that best describes how the outcome (y) increases/decreases with the exposure (x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How are regression parameters (β0, β1) estimated?

A

The line is fitted with the shortest distance between points and line. Distances are called residuals
By minimising the sum of squared residuals, the differences between observed and predicted values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is H0 and H1 in testing the slope (β1)?

A

β1 = 0 (no relationship between x and y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is the t-statistic for the slope calculated?

A

t = β1 / SE of β1 - where SE is the standard error of the slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a significant p-value for β1 indicate?

A

Evidence against H0, suggesting an association between x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a 95% CI for β1?

A

Small sample sizes: β1 +- t* x SE, where t* is the critical value from the t-distribution
Large sample sizes: typically use 5% point of normal distribution (1.96) instead of t-distribution
The CI details the values between which the slope of the line could lie

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If a 95% CI for β1 does not include 0, what does it mean?

A

It indicates a statistically significant relationship between x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you predict y for a given x?

A

Use the regression equation 𝑦^ = β0 + β1x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why should predictions not extrapolate beyond the range of x?

A

The relationship may not be linear outside the observed data range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why might β0 (intercept) not always be meaningful?

A

If x = 0 is outside the observed data range, β0 may not provide useful information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are binary categorical variables incorporated into regression?

A

By coding them as 0 and 1, representing the two categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does β1 represent in a regression with a binary predictor?

A

The difference in the outcome between the two categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the purpose of multiple linear regression?

A

To examine the effect of multiple predictors on the outcome simultaneously

17
Q

How do you write the equation for multiple regression with two predictors?

A

y = β0 + β1x1 + β2x2, where x1 and x2 are predictors

18
Q

What is the adjusted R2 in a regression?

A

A measure of model fit that adjusts for the number of predictors

19
Q

What is centring a predictor variable?

A

Subtracting the mean of x from each value to create a new variable x_cent = x - x̄
The intercept would then be at the mean outcome

20
Q

Why centre a predictor variable?

A

To make the intercept (β0) interpretable as the value of y at the mean of x
E.g., it is not meaningful to predict y based on a body weight of 0

21
Q

Name key assumptions of linear regression

A
  1. Linearity of relationship
  2. Homoscedasticity (constant variance of residuals)
  3. Normality of residuals
  4. Independence of observations
22
Q

On a scatter plot, which variable goes on which axis?

A

Outcome on y-axis, exposure on x-axis

23
Q

What do the different regression lines represent?

A
  • = no association
    / = positive association
    \ = negative association
24
Q

What does _cons represent in a regression output?

A

β0 (intercept)

25
Q

How would you calculate y if the coefficient for β0 (intercept) is 0.0857 and the coefficient for β1 (slope) is 0.0436? Assume x = 60
What would you conclude about the association between x and y?

A

y = 0.0857 + 0.0436 x 60 = 2.70
Per one unit increase in x results in a 0.0436 litre increase in y

26
Q

What are β0 and β1 subject to?

A

Sampling variation (β0 and β1 are estimates of population values of the intercepts and slopes)

27
Q

How is precision measured?

A

Standard errors (CIs)

28
Q

Outline the logic behind fitting a linear regression model if x is a categorical variable using the example of lung function (FEV) with sex (binary) as predictor:

A
  1. FEV1 = β0 + β1(sex)
  2. Ascertain values for each category (females = 0, males = 1)
  3. Estimate mean FEV by sex
  4. If sex = 0, FEV1 = β0 + β1 x 0
  5. If sex = 1, FEV1 = β0 + β1 x 1
  6. Fit regression model
  7. Coefficient for _cons = mean FEV for girls = 1.54
  8. For boys, mean FEV1 = 1.54 + 0.12 x 1 = 1.66
  9. Boys have a larger mean FEV than girls (see mean difference and associated p value)
    Note: We would get the same results if we did an unpaired t-test