Unit 1: Simple Linear Regression Flashcards
To understand the primary terminology and core content of Simple Linear Regression
Why do we use simple linear regression?
To model a response variable Y against the predictor variable X
What is Covariance (SXY)?
Covariance describes the joint behavior of two Random Variables (X and Y).
The sign indicates the direction but we cannot know the strength because it is dependent on units.
What is the correlation coefficient (R) and what does it tell us?
The correlation coefficient (R) measures the linear relationship between two or more quantitative variables and falls between -1 and 1. The R value tells you if there is a linear relationship and the strength and direction of that relationship.
What is the coefficient of determination (R2)? What can it tell you about the linear relationship?
The coefficient of determination (R2) = SSM/SST.
It is the proportion of the variability in y explained by the linear association with x. It falls between 0 and 1.
It can tell you the strength of the relationship but not the direction.
If the covariance of two variables = 0, what can you say about the independence of the variables?
You cannot know if the variables are independent just because the covariance is 0. You can only know that there is no linear relationship between those variables. If 2 variables are KNOWN to be independent, than the covariance equals 0, but you cannot assume independence when covariance is 0.
What is Fisher’s Z Transformation?
It is a variance stabilizing transformation that allows you to construct confidence interval for any p. It can indirectly test the null hypothesis that p=p0 (rho = observed rho) for any p0 not equal to 0. The rho (p) is more accurate near the boundaries.
Residuals
Estimated error = observed Y- expected Y
What are the hypotheses for the overall F test for SLR?
H0: B1 = 0 (Slope of X =0 and the intercept-only model is a better model)
H1: B1 =/= 0 (Slope of X is not equal to 0. The model with X is a better model)
What are the assumptions for SLR?
Linearity
Independence
Normality of Error
Errors are homoskedastic
What do violations of SLR assumptions look like?
Curved shape
Fanning shape
heteroskedacity of the residuals
What do we do when assumptions are violated?
Proceed with analysis because inference is robust to minor deviations from the assumptions for a large n.
For major violations, consider variable transformations or adding higher order polynomial terms.
For clear trends, consider adding predictors (MLR)
For heteroskedacity, consider advanced regression techniques
What causes the Coefficient of Determination (R2) to increase?
Increase in SSM Increase in MSM Decrease in SSE Decrease in Residual Variance (O2) Stronger Linear relationship between X and Y
What causes the Coefficient of Determination (R2) to decrease?
Decrease in SSM Decrease in MSM Increase in SSE Increase in Residual Variance (O2) Weaker linear relationship between X and Y
What are outliers?
Outliers are far from data and include points of leverage and influential points.
Why do we use method of least squares?
“Closed form” solution
Estimates (B0 & B1) are identical to those from Maximum Likelihood Estimates (MLE)
The estimates are unbiased and have smallest possible variance
What three tests are identical in SLR?
- T-test for correlation (H0: p(rho)=0)
- T-test for slope (H0:B1=0)
- F-test for overall model fit (H0: Y=B0 + E)