Week 9 - Regression models Flashcards
Regression
Analysis of two variables on a scatterplot
The regression of Y on x is the conditional mean:
E(Y | x) = µ(x)
Simple linear regression
Form of a straight line:
E(Y | x) = β0 + β1x
We also assume constant variance:
var(Y | x) = σ^2
There are 3 parameters:
- β0
- β1
- σ^2
Explains how the response variable, y, changes (linearly) in relation to an explanatory variable, x, on average
Estimation goals for regression
We wish to estimate the slope (β1), the intercept (β0), the variance of the errors (σ^2), their standard errors and construct confidence intervals for these quantities
Least squares estimation
Using sum of squared deviations, we need to find β0 and β1 that minimises the sum
This gives the least squares estimators
Method is called ordinary least squares (OLS)
Difference between the actual vs predicted values is called residual
Properties of the estimators (B0, B1, σ^2)
- All of the estimators are unbiased
Coefficient of determination (R^2)
It quantifies the proportion of variation of the response variables (Yi’s) that is explained by the regression model
“This model explain about <50%> of the variation in the data”
R^2 ranges from 0 to 1
Maximum likelihood estimation on regression
Assuming a normal distribution
The β0 and β1 that maximise the likelihood (minimise the log-likelihood) are the same as those that minimise the sum of squared deviations, H(β0, β1)
The OLS estimates are the same as the MLEs
Predicting a future value
Point prediction is given directly from fitted regression line
Prediction interval
Estimate where future observations are likely to fall, given a certain level of confidence
Similar to CI, but is for estimating a random quantity Y, rather than a fixed quantity u(x)
Will be wider than confidence intervals
Assumptions of linear regression
Linear model for the mean
Equal variance for all observations (homoscedasticity)
Normally distributed residuals
Descriptive statistics (4 moments)
- Centre
- Spread
- Skew
- Kurtosis/Outlier
Characteristics of a regression line
- Direction: Positive or Negative
- Strength: Strong or Weak (R^2 value)
- Form: Linear or Non-linear