Unit 1: Simple Linear Regression Flashcards

Question 1

Q

Why do we use simple linear regression?

Answer

A

To model a response variable Y against the predictor variable X

Question 2

Q

What is Covariance (SXY)?

Answer

A

Covariance describes the joint behavior of two Random Variables (X and Y).
The sign indicates the direction but we cannot know the strength because it is dependent on units.

Question 3

Q

What is the correlation coefficient (R) and what does it tell us?

Answer

A

The correlation coefficient (R) measures the linear relationship between two or more quantitative variables and falls between -1 and 1. The R value tells you if there is a linear relationship and the strength and direction of that relationship.

Question 4

Q

What is the coefficient of determination (R2)? What can it tell you about the linear relationship?

Answer

A

The coefficient of determination (R2) = SSM/SST.

It is the proportion of the variability in y explained by the linear association with x. It falls between 0 and 1.
It can tell you the strength of the relationship but not the direction.

Question 5

Q

If the covariance of two variables = 0, what can you say about the independence of the variables?

Answer

A

You cannot know if the variables are independent just because the covariance is 0. You can only know that there is no linear relationship between those variables. If 2 variables are KNOWN to be independent, than the covariance equals 0, but you cannot assume independence when covariance is 0.

Question 6

Q

What is Fisher’s Z Transformation?

Answer

A

It is a variance stabilizing transformation that allows you to construct confidence interval for any p. It can indirectly test the null hypothesis that  p=p0 (rho = observed rho) for any p0  not equal to 0. 
The rho (p) is more accurate near the boundaries.

Question 7

Q

Residuals

Answer

A

Estimated error = observed Y- expected Y

Question 8

Q

What are the hypotheses for the overall F test for SLR?

Answer

A

H0: B1 = 0 (Slope of X =0 and the intercept-only model is a better model)
H1: B1 =/= 0 (Slope of X is not equal to 0. The model with X is a better model)

Question 9

Q

What are the assumptions for SLR?

Answer

A

Linearity
Independence
Normality of Error
Errors are homoskedastic

Question 10

Q

What do violations of SLR assumptions look like?

Answer

A

Curved shape
Fanning shape
heteroskedacity of the residuals

Question 11

Q

What do we do when assumptions are violated?

Answer

A

Proceed with analysis because inference is robust to minor deviations from the assumptions for a large n.

For major violations, consider variable transformations or adding higher order polynomial terms.

For clear trends, consider adding predictors (MLR)

For heteroskedacity, consider advanced regression techniques

Question 12

Q

What causes the Coefficient of Determination (R2) to increase?

Answer

A

Increase in SSM
Increase in MSM
Decrease in SSE
Decrease in Residual Variance (O2)
Stronger Linear relationship between X and Y

Question 13

Q

What causes the Coefficient of Determination (R2) to decrease?

Answer

A

Decrease in SSM
Decrease in MSM
Increase in SSE
Increase in Residual Variance (O2)
Weaker linear relationship between X and Y

Question 14

Q

What are outliers?

Answer

A

Outliers are far from data and include points of leverage and influential points.

Question 15

Q

Why do we use method of least squares?

Answer

A

“Closed form” solution
Estimates (B0 & B1) are identical to those from Maximum Likelihood Estimates (MLE)
The estimates are unbiased and have smallest possible variance

Question 16

Q

What three tests are identical in SLR?

Answer

A

T-test for correlation (H0: p(rho)=0)
T-test for slope (H0:B1=0)
F-test for overall model fit (H0: Y=B0 + E)

Unit 1: Simple Linear Regression Flashcards

To understand the primary terminology and core content of Simple Linear Regression