Reading 1: Linear Regression Flashcards

Question 1

Q

Correlation Equation

Answer

A

Covariance of X and Y / (sample SD of X)(Sample SD of Y)

Question 2

Q

Slope coefficient

Answer

A

Cov(X,Y)/ (standard deviation^2)

Question 3

Q

Sum of squared errors (SSE)

Answer

A

sum of the squared vertical distances between the estimated and actual Y-values is referred
the regression line is the line that minimizes the SSE. This explains why simple linear regression is frequently referred to as ordinary least squares (OLS) regression, and the values determined by the estimated regression equation, are called least squares estimates.

Question 4

Q

Linear regression assumptions

Answer

A

he linear relationship exists between the dependent and the independent variables.
The variance of the residual term is constant for all observations (homoskedasticity).
The residual term is independently distributed; that is, the residual for one observation is not correlated with that of another observation (meaning that the paired X and Y observations are independent of each other).
The residual term is normally distributed.

Question 5

Q

Heteroskedasticity

Answer

A

occurs when the variance of the residuals differs across observations.
We can see that the model residuals are more widely dispersed around higher values of X than around lower values of X. If these observations were chronological, then it appears that the model accuracy has declined over time.
We can see that the model residuals are more widely dispersed around higher values of X than around lower values of X. If these observations were chronological, then it appears that the model accuracy has declined over time.

Question 6

Q

Serial Correlation (Independence)

Answer

A

-If the observations (X and Y pairs) are not independent, then the residuals from the model will exhibit serial correlation

Question 7

Q

Total sum of squares (SST)

Answer

A

measures the total variation in the dependent variable.
SST is equal to the sum of the squared differences between the actual Y-values and the mean of Y.
total variation = explained variation + unexplained variation

Question 8

Q

Regression sum of squares (RSS)

Answer

A

measures the variation in the dependent variable that is explained by the independent variable.
RSS is the sum of the squared distances between the predicted Y-values and the mean of Y.

Question 9

Q

Sum of squared errors (SSE)

Answer

A

measures the unexplained variation in the dependent variable. It’s also known as the sum of squared residuals or the residual sum of squares.
SSE is the sum of the squared vertical distances between the actual Y-values and the predicted Y-values on the regression line.

Question 10

Q

Regression (explained)

Answer

A

-df = 1
- Sum of squares = RSS
-Mean sum of squares
MSR = RSS/k = RSS/1 = RSS

Question 11

Q

Error (unexplained)

Answer

A

-df = n-1
- Sum of squares = SSE
-Mean sum of error
MSR = SSE/(n-2)

Question 12

Q

Total

Question 13

Q

Standard Error of Estimate

Answer

A

standard deviation of its residuals. The lower the SEE, the better the model fit.
SEE =√MSE

Question 14

Q

COEFFICIENT OF DETERMINATION (R2)

Answer

A

defined as the percentage of the total variation in the dependent variable explained by the independent variable.
R2 = RSS / SST
For simple linear regression (i.e., with one independent variable), the coefficient of determination, R2, may be computed by simply squaring the correlation coefficient, r. In other words, R2 = r2 for a regression with one independent variable.

Question 15

Q

F statistic

Answer

A

F-test assesses how well a set of independent variables, as a group, explains the variation in the dependent variable.
F-statistic is used to test whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable.

F = MSR/MSE = (RSS/k)/(SSE/n-k-1)

Important: This is always a one-tailed test!
dfnumerator = k = 1
dfdenominator = n − k − 1 = n − 2

Question 16

Q

Interpreting p-values

Answer

Study These Flashcards

A

-The p-value is the smallest level of significance for which the null hypothesis can be rejected. When testing for the hypothesis that the coefficient is equal to zero versus not equal to zero, an alternative method is to compare the p-value to the significance level:

-If the p-value is less than the significance level, the null hypothesis can be rejected.
-If the p-value is greater than the significance level, the null hypothesis cannot be 
 rejected.

Question 17

Q

T-test for correlation coefficient

Answer

Study These Flashcards

A

t = (r * (n-2)^.5)/(1-r^2)

Question 18

Q

Standard Error of forecast for confidence interval

Answer

Study These Flashcards

A

S2f = SEE^2 * [1 + (1/n) + (((X - Xmean)^2) / ((n+1) * variance) )

Question 19

Q

Log-Lin Model

Answer

Study These Flashcards

A

If the dependent variable is logarithmic while the independent variable is linear.
The slope coefficient is interpreted as relative change in the dependent variable for an absolute change in the independent variable.

ln(Yi) = b0 + b1Xi + error term

Question 20

Q

Lin-Log Model

Answer

Study These Flashcards

A

If the dependent variable is linear while the independent variable is logarithmic.
The slope coefficient is interpreted as absolute change in the dependent variable for a relative change in the independent variable.

Yi = b0 + b1*ln(Xi) + error term

Question 21

Q

Log-Log Model

Answer

Study These Flashcards

A

If both the dependent variable and the independent variable are logarithmic.
The slope coefficient is interpreted as the relative change in the dependent variable for a relative change in the independent variable.

ln(Yi) = b0 + b1*ln(Xi) + error term

Reading 1: Linear Regression Flashcards

(21 cards)