For the Midterm Flashcards
What is the variable Y called?
The response variable
What are the X variables called?
The predictor, or explanatory variables.
What is multiple linear regression?
It relates one numerical characteristic, the response variable, to one or more predictor or explanatory variables.
In multiple linear regression, what do we typically assume about epsilon?
That the epsilon are independent, homoscedastic (same variability) and, if the sample size is small to moderate, that they are approximately normal in distribution.
What are regression methods used for?
1) Identify and characterize the relationships between the response and predictor/explanatory variables.
2) Estimate or predict the value of the response variable for combinations of the predictor/explanatory variables.
What is the objective of time series analysis?
To identify patterns and trends, and to predict future observations.
What does UCLM mean?
Upper confidence limit for regression line
What does LCLM mean?
Lower Confidence Limit for Regression Line
What does UCL mean?
Upper prediction limit
What does LCL mean?
Lower prediction limit
What is the simple linear regression model?
Y_i = B_o + B_i*x_i + epsilon
How do we denote the ith residual?
e_i
How are residuals found?
y_i - y_hat_i
What is the prediction interval?
The interval a new value is likely to be in
What is the confidence interval?
The interval we’re confident a value we already have is in.
What sort of hypothesis test do we run on the slope in multiple linear regression?
We test whether it’s equal to 0 or not! It we reject H_o, then the slope is not 0.
What does the analysis of variance do?
Tests to see if there exist two slopes that equal each other, and if they equal zero.
What does the SSE tell us?
How much of the model’s variance is due to random error.
What does SSE denote?
Sum of Squares due to Error
What does SSR denote?
Sum of Squares due to the Model (or our regression)
What does the SSR tell us?
How much of our model’s variance is understood by the model.
What does R-square tell us?
The proportion of our model’s variance that is understood by the model. (the closer to 1, the more accurate our model)
Why do we care about partial regressions?
They help us to see the relationship between Y and a single X!
What do we use partial plots to determine?
If there is a linear relationship between Y and each individual X (predictor variable)
What is collinearity?
When one or more predictor variables are close to being a linear combination of the other predictor variables.
What are symptoms of collinearity?
- The regression coefficients have unlogical “signs” (- or +)
- The regression coefficients are huge in magnitude, and have even larger standard errots
- The individual coefficients are nonsignificant, but when grouped with other coefficients are significant.
How can we diagnose collinearity?
- A correlation matrix (look for variables that seem to have a linear relationship.
- Regress each predictor variable on all the other predictor variables.
• Look for high values of R-square
• A Variance Inflation Factor ≥ 10 suggests collinearity. - Look at the condition indices (large values and jumps in the values indicate collinearity.)
How can we fix collinearity?
- Backward elimination
- Forward Selection
- “Stepwise”?? Selection
How does backward elimination work?
It takes out variables whose p-values are greater than alpha (usually .05)
How does forward selection work?
It adds variables that 1) have the highest correlation with the current model, and 2) have a p-value less than alpha.