LInear regression Flashcards
Linear regression
predicts a response variable from values of an explanatory variable by fitting a model of a straight line to the data
(different from correlation as correlation treats both variables equally and measures strength of association whilst regression measures how steeply response variable changes on average with changes in explanatory variable)
Overall:
- Estimate values for slope and intercept using equations based off least squares
- Calculate mean square error (S^2) of slope and intercept
- Calculate SE and CI using S^2 and T values
- Use T test to see if slope or intercept significantly different from 0
- Use ANOVA table to see if a significant amount of variation is explained by the model.
Fitting the line
y = ax + b OR y = mx + c (+error)
Use equation to work out slope and intercept -> base off least square equations
Equation can be used to predict Y hat values from X values (aim is to minimise the error in these predictions)
Can only make reliable predictions w/in range of X – cannot extrapolate (as relationship may not be linear outside range, we don’t know)
Assumptions
- At each x value there is a pop’n of possible Y values whose mean lies on the true regression line – this is assumption that relationship is linear
- At each value of x, the distrib of possible Y values is normal
- Variance of Y values is same at all values of X
- At each value of X, y measurements represent a random sample form the pop’n of possible Y values
Residuals
measure vertical deviation of Y from the least squares regression line
Residual = observed value – predicted value = Y-Y ̂
Testing model significance
Error sum of square (SSE/ SSE): sums of difference between obserbed and expected squared or calculated by SST-SSR
Error mean square (MSR/ MSE/ S^2): ESS/ N-2
T test approach:
Estimating SE of slope and intercept requires S^2
Can calculate CI for intercept and slope using SE and T value for T disitrbution (using n-2 df)
Do CI overlap 0?
Use T test to see if slope and intercept are significantly different from 0.
-> Tend to be significant if T value is twice the SE
ANOVA approach
- Split variation into variation explained by regression line and error.
- create ANOVA table
Regression variation: SSR: 1: MSR
Error variation: SSE (= TSS- RSS): n-2: S^2
Total variation: TSS: N-1: