Reading 1: Linear Regression Flashcards

1
Q

Correlation Equation

A

Covariance of X and Y / (sample SD of X)(Sample SD of Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Slope coefficient

A

Cov(X,Y)/ (standard deviation^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sum of squared errors (SSE)

A
  • sum of the squared vertical distances between the estimated and actual Y-values is referred
  • the regression line is the line that minimizes the SSE. This explains why simple linear regression is frequently referred to as ordinary least squares (OLS) regression, and the values determined by the estimated regression equation, are called least squares estimates.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Linear regression assumptions

A
  1. he linear relationship exists between the dependent and the independent variables.
  2. The variance of the residual term is constant for all observations (homoskedasticity).
  3. The residual term is independently distributed; that is, the residual for one observation is not correlated with that of another observation (meaning that the paired X and Y observations are independent of each other).
  4. The residual term is normally distributed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Heteroskedasticity

A
  • occurs when the variance of the residuals differs across observations.
  • We can see that the model residuals are more widely dispersed around higher values of X than around lower values of X. If these observations were chronological, then it appears that the model accuracy has declined over time.
  • We can see that the model residuals are more widely dispersed around higher values of X than around lower values of X. If these observations were chronological, then it appears that the model accuracy has declined over time.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Serial Correlation (Independence)

A

-If the observations (X and Y pairs) are not independent, then the residuals from the model will exhibit serial correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Total sum of squares (SST)

A
  • measures the total variation in the dependent variable.
  • SST is equal to the sum of the squared differences between the actual Y-values and the mean of Y.
  • total variation = explained variation + unexplained variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression sum of squares (RSS)

A
  • measures the variation in the dependent variable that is explained by the independent variable.
  • RSS is the sum of the squared distances between the predicted Y-values and the mean of Y.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sum of squared errors (SSE)

A
  • measures the unexplained variation in the dependent variable. It’s also known as the sum of squared residuals or the residual sum of squares.
  • SSE is the sum of the squared vertical distances between the actual Y-values and the predicted Y-values on the regression line.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Regression (explained)

A

-df = 1
- Sum of squares = RSS
-Mean sum of squares
MSR = RSS/k = RSS/1 = RSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Error (unexplained)

A

-df = n-1
- Sum of squares = SSE
-Mean sum of error
MSR = SSE/(n-2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Total

A

df: n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standard Error of Estimate

A
  • standard deviation of its residuals. The lower the SEE, the better the model fit.
  • SEE =√MSE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

COEFFICIENT OF DETERMINATION (R2)

A
  • defined as the percentage of the total variation in the dependent variable explained by the independent variable.
  • R2 = RSS / SST
  • For simple linear regression (i.e., with one independent variable), the coefficient of determination, R2, may be computed by simply squaring the correlation coefficient, r. In other words, R2 = r2 for a regression with one independent variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

F statistic

A
  • F-test assesses how well a set of independent variables, as a group, explains the variation in the dependent variable.
  • F-statistic is used to test whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable.

F = MSR/MSE = (RSS/k)/(SSE/n-k-1)

  • Important: This is always a one-tailed test!
  • dfnumerator = k = 1
  • dfdenominator = n − k − 1 = n − 2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interpreting p-values

A

-The p-value is the smallest level of significance for which the null hypothesis can be rejected. When testing for the hypothesis that the coefficient is equal to zero versus not equal to zero, an alternative method is to compare the p-value to the significance level:

-If the p-value is less than the significance level, the null hypothesis can be rejected.
-If the p-value is greater than the significance level, the null hypothesis cannot be 
 rejected.
17
Q

T-test for correlation coefficient

A

t = (r * (n-2)^.5)/(1-r^2)

18
Q

Standard Error of forecast for confidence interval

A

S2f = SEE^2 * [1 + (1/n) + (((X - Xmean)^2) / ((n+1) * variance) )

19
Q

Log-Lin Model

A
  • If the dependent variable is logarithmic while the independent variable is linear.
  • The slope coefficient is interpreted as relative change in the dependent variable for an absolute change in the independent variable.

ln(Yi) = b0 + b1Xi + error term

20
Q

Lin-Log Model

A
  • If the dependent variable is linear while the independent variable is logarithmic.
  • The slope coefficient is interpreted as absolute change in the dependent variable for a relative change in the independent variable.

Yi = b0 + b1*ln(Xi) + error term

21
Q

Log-Log Model

A
  • If both the dependent variable and the independent variable are logarithmic.
  • The slope coefficient is interpreted as the relative change in the dependent variable for a relative change in the independent variable.

ln(Yi) = b0 + b1*ln(Xi) + error term