SRM Chapter 2 Flashcards
SLR
- Simple Linear Regression
- Relationship between two numeric variables
- Parametric
MLR
- Multiple Linear Regression
- Multiple predictors (x’s) used to predict the dependent variable (y).
- Parametric
Residuals
- e = y - y-hat
- For each i
- Want this to be minimized obviously
- This is done by: ordinary least squares tries to minimize the sum of the squared residuals.
Partitioning of Variability
Parameter Estimates
R-squared
- Coefficient of determination
- Portion of variability in the response explained by the predictors
- R-squared = SSR/SST
- Between 0 and 1 (is a %).
- Want this to be high
Adjusted R-Squared
- Adjustment for MLR that accounts for the number of predictors
- Does not have to range from 0 to 1.
B0
- Intercept parameter
- Free parameter
- Equal to y when x is 0.
B1
- Slope parameter
- Free parameter
- For every unit increase in x, y increases by B1 * x.
SLR Model Assumptions (6)
- Yi - B0 + B1Xi + ei
(linear function plus error) - xi’s are non-random
- Expected value of ei is 0.
-> so expected value of Yi is B0 + B1Xi (ei cancels to 0). - Variance of ei is sigma-squared.
-> Because E[ei] = 0, the variance of Yi is also sigma-squared.
-> Also homoscedasticity (variance constant across all observations). - ei’s are independent (across observations (?)).
- ei’s are normally distributed (across observations (?)).
Homoscedasticity
- Variance (sigma-squared) is constant across all observations
b0
- Estimate of B0 to get y-hat
b1
- Estimate of B1 to get y-hat
Method to estimate b0 and b1
- Ordinary least squares/method of least squares
Ordinary Least Squares
- Determines estimates b0, b1
- Optimization equation
- Estimators are unbiased (bias = 0).
MSE
- Mean squared error
- Estimate of sigma-squared
- Denominator is n-2
- Unbiased, so bias is 0.
- Best fit is when MSE is minimized.
RSE
- Residual standard error
- Aka residual standard deviation
- sqrt(MSE)
Design Matrix
- X
- Matrix for the x’s (?)
Hat Matrix
- H
- Aka projection matrix
- H times vector of actual responses = fitted values of response
- In other words, y-hat = H*y
b Matrix
- 1*2 matrix of b0 and b1
y Matrix
- Matrix for actual observed values of y
SSR
- Regression sum of squares
- Proportion of variability in y explained by the predictors
SSE
- Error sum of squares
- Aka sum of squared residuals
- Proportion of variability in y that cannot be explained by the predictors
SST
- Total sum of squares
- Total variability (both explained and unexplained)
- SST = SSR + SSE
Positive Residual
- Actual observation > (larger than) predicted observation
Negative Residual
- Actual observation < (smaller than) predicted observation
Null model
- Y = B0 + e
- No predictors (x’s)
- No relationship between y and x’s
Do you want R-squared and Adjusted R-squared to be high or low?
- High
- Means more of the variance in y can be explained by the predictor(s).
- Want this to be as high as possible so that the unexplained variance is minimized.
Is R-squared or Adjusted R-squared better for comparing MLR models? Why?
- Adjusted R-squared
- Because R-squared increased as predictors are added so a larger R-squared doesn’t necessarily mean a better model.
- But Adjusted R-squared accounts for the number of predictors so it is a better method of comparison between models.
Two-tailed t Test (Hypothesis Test): What are we testing, and why?
- Test to see whether the slope parameter is 0 (B1 = 0).
- H0: B1 = 0
- H1: B1 <> 0
- If true, then there is no relationship between the x’s and y.
- So, we want to reject H0 to say that it’s plausible that there is a linear relationship between x’s and y.
Test Decision (Two-Tailed t Test)
- For significance level a, reject H0 if:
- |t-stat| => ta/2,n-2
- p-value <= a
One-Tailed t Test (Hypothesis Test): What are we testing and why?
- Same as two-tailed but sometimes it’s more appropriate to only have to reject one region
- Looking to prove that there is a positive slope between x and y.
When do we use a right-tailed t test?
- When only a right tail rejection is needed
When do we use a left-tailed t test?
- When only a left tail rejection is needed
Confidence vs Prediction Interval
- Confidence: range for the mean response (across all observations)
- Prediction: range for the response of a new observation
- Prediction > Confidence (prediction is always at least as wide as the confidence interval).
Confidence Interval
- Range that estimates the MEAN response
- Narrows in the middle
Prediction Interval
- Range that estimates a NEW observation’s response
- Narrows in the middle (when the chosen predictor value is also the sample mean of the predictor)
Why is the prediction interval at least as wide as the confidence interval?
- Prediction accounts for the variance in e in addition to Y-hat
- Have to cast a wider net to predict a new single response as opposed to the mean response over all observations
Regression Coefficients
- B0, B1,…,Bp (Bj’s).
- B0 is still the intercept
- B1,…,Bp are regression coefficients instead of slope because that no longer makes sense with multiple predictors (x’s).
Added assumption for MLR
- Predictor xj must not be a linear combination of other p predictors
- Because if an xj is a linear combination of other predictors it doesn’t add any additional information about the relationship between x’s and y.
Nested Models
- Models that share a set of predictors
- Each model is a subset of the next model with more predictors
Nested MLRs: p
- p is a measure of flexibility