Week 6: Linear Regression Flashcards by Amelia Jasinski

What is the purpose of linear regression?

To describe the relationship between a continuous outcome and one or more predictors

How well did you know this?

Not at all

Perfectly

What does the slope (β1) in a regression equation represent?

The change in the outcome variable (y) for each unit increase in the predictor variable (x)

How well did you know this?

Not at all

Perfectly

What does the intercept (β0) in a regression equation represent?

The value of the outcome variable (y) when the predictor variable (x) is zero

How well did you know this?

Not at all

Perfectly

Write the general equation for simple linear regression

y = β0 + β1x
The equation of the straight line that best describes how the outcome (y) increases/decreases with the exposure (x)

How well did you know this?

Not at all

Perfectly

How are regression parameters (β0, β1) estimated?

The line is fitted with the shortest distance between points and line. Distances are called residuals
By minimising the sum of squared residuals, the differences between observed and predicted values

How well did you know this?

Not at all

Perfectly

What is H0 and H1 in testing the slope (β1)?

β1 = 0 (no relationship between x and y)

How well did you know this?

Not at all

Perfectly

How is the t-statistic for the slope calculated?

t = β1 / SE of β1 - where SE is the standard error of the slope

How well did you know this?

Not at all

Perfectly

What does a significant p-value for β1 indicate?

Evidence against H0, suggesting an association between x and y

How well did you know this?

Not at all

Perfectly

What is a 95% CI for β1?

Small sample sizes: β1 +- t* x SE, where t* is the critical value from the t-distribution
Large sample sizes: typically use 5% point of normal distribution (1.96) instead of t-distribution
The CI details the values between which the slope of the line could lie

How well did you know this?

Not at all

Perfectly

If a 95% CI for β1 does not include 0, what does it mean?

It indicates a statistically significant relationship between x and y

How well did you know this?

Not at all

Perfectly

How do you predict y for a given x?

Use the regression equation 𝑦^ = β0 + β1x

How well did you know this?

Not at all

Perfectly

Why should predictions not extrapolate beyond the range of x?

The relationship may not be linear outside the observed data range

How well did you know this?

Not at all

Perfectly

Why might β0 (intercept) not always be meaningful?

If x = 0 is outside the observed data range, β0 may not provide useful information

How well did you know this?

Not at all

Perfectly

How are binary categorical variables incorporated into regression?

By coding them as 0 and 1, representing the two categories

How well did you know this?

Not at all

Perfectly

What does β1 represent in a regression with a binary predictor?

The difference in the outcome between the two categories

How well did you know this?

Not at all

Perfectly

What is the purpose of multiple linear regression?

Study These Flashcards

To examine the effect of multiple predictors on the outcome simultaneously

How do you write the equation for multiple regression with two predictors?

Study These Flashcards

y = β0 + β1x1 + β2x2, where x1 and x2 are predictors

What is the adjusted R2 in a regression?

Study These Flashcards

A measure of model fit that adjusts for the number of predictors

What is centring a predictor variable?

Study These Flashcards

Subtracting the mean of x from each value to create a new variable x_cent = x - x̄
The intercept would then be at the mean outcome

Why centre a predictor variable?

Study These Flashcards

To make the intercept (β0) interpretable as the value of y at the mean of x
E.g., it is not meaningful to predict y based on a body weight of 0

Name key assumptions of linear regression

Study These Flashcards

Linearity of relationship
Homoscedasticity (constant variance of residuals)
Normality of residuals
Independence of observations

On a scatter plot, which variable goes on which axis?

Study These Flashcards

Outcome on y-axis, exposure on x-axis

What do the different regression lines represent?

Study These Flashcards

= no association
/ = positive association
\ = negative association

What does _cons represent in a regression output?

Study These Flashcards

β0 (intercept)

How would you calculate y if the coefficient for β0 (intercept) is 0.0857 and the coefficient for β1 (slope) is 0.0436? Assume x = 60 What would you conclude about the association between x and y?

y = 0.0857 + 0.0436 x 60 = 2.70 Per one unit increase in x results in a 0.0436 litre increase in y

What are β0 and β1 subject to?

Sampling variation (β0 and β1 are estimates of population values of the intercepts and slopes)

How is precision measured?

Standard errors (CIs)

Outline the logic behind fitting a linear regression model if x is a categorical variable using the example of lung function (FEV) with sex (binary) as predictor:

1. FEV1 = β0 + β1(sex) 2. Ascertain values for each category (females = 0, males = 1) 3. Estimate mean FEV by sex 4. If sex = 0, FEV1 = β0 + β1 x 0 5. If sex = 1, FEV1 = β0 + β1 x 1 6. Fit regression model 7. Coefficient for _cons = mean FEV for girls = 1.54 8. For boys, mean FEV1 = 1.54 + 0.12 x 1 = 1.66 9. Boys have a larger mean FEV than girls (see mean difference and associated p value) Note: We would get the same results if we did an unpaired t-test

Week 6: Linear Regression Flashcards

(28 cards)