Week 6: Linear Regression Flashcards
What is the purpose of linear regression?
To describe the relationship between a continuous outcome and one or more predictors
What does the slope (β1) in a regression equation represent?
The change in the outcome variable (y) for each unit increase in the predictor variable (x)
What does the intercept (β0) in a regression equation represent?
The value of the outcome variable (y) when the predictor variable (x) is zero
Write the general equation for simple linear regression
y = β0 + β1x
The equation of the straight line that best describes how the outcome (y) increases/decreases with the exposure (x)
How are regression parameters (β0, β1) estimated?
The line is fitted with the shortest distance between points and line. Distances are called residuals
By minimising the sum of squared residuals, the differences between observed and predicted values
What is H0 and H1 in testing the slope (β1)?
β1 = 0 (no relationship between x and y)
How is the t-statistic for the slope calculated?
t = β1 / SE of β1 - where SE is the standard error of the slope
What does a significant p-value for β1 indicate?
Evidence against H0, suggesting an association between x and y
What is a 95% CI for β1?
Small sample sizes: β1 +- t* x SE, where t* is the critical value from the t-distribution
Large sample sizes: typically use 5% point of normal distribution (1.96) instead of t-distribution
The CI details the values between which the slope of the line could lie
If a 95% CI for β1 does not include 0, what does it mean?
It indicates a statistically significant relationship between x and y
How do you predict y for a given x?
Use the regression equation 𝑦^ = β0 + β1x
Why should predictions not extrapolate beyond the range of x?
The relationship may not be linear outside the observed data range
Why might β0 (intercept) not always be meaningful?
If x = 0 is outside the observed data range, β0 may not provide useful information
How are binary categorical variables incorporated into regression?
By coding them as 0 and 1, representing the two categories
What does β1 represent in a regression with a binary predictor?
The difference in the outcome between the two categories