Quant Methods #9 - Correlation and Regression Flashcards

Question 1

Q

covariance (def) of two random variables

Answer

A

LOS 9.a

a statistical measure of the degree to which two variables move together.

covariance captures the linear relationship between two variables
positive covariance : variables tend to move together
negative covariance: variables tend to move in opposite directions

Question 2

Q

cov_XY =

Answer

A

LOS 9.a

cov_XY = E_{(i=1 to n)} (X_i - X_mean)(Y_i - Y_mean) / (n-1)

where:

n = sample size
X_i = ith observation on variable X
X_mean = mean of X_{1 to n}
_{<span>Y</span>i} = ith observation on variable Y
_{<span>Y</span>mean} = mean of Y_{1 to n}

Question 3

Q

Why is covariance not very meaningful but correlation coefficient is?

Answer

A

LOS 9.a

Covariance:

extremely sensitive to the scale of the two variables
range is -infinity to +infinity
presented in terms of squared units

Correlation coefficient converts covariance into a standardized measure that is easier to interpret.

Question 4

Q

sample correlation coefficent, r_XY = ?

Answer

A

LOS 9.a

r_XY = cov_XY / s_X s_Y

s_X = sample standard deviation of X
s_Y = sample standard deviation of Y

Question 5

Q

Interpret for r_XY (sample correlation coefficient):

r = +1

0 < r < 1

r = 0

-1 < r < 0

r = -1

Answer

A

LOS 9.a

perfect positive linear correlation

positive linear relationship

no linear relationship

negative linear relationship

perfect negative linear correlation

Note that for r = 1 and r = -1 the data points lie exactly on a line, but the slope is not necessarily +1 or -1.

Question 6

Q

What are the limitiations to correlation analysis?

Answer

A

LOS 9.b

Outliers - can significantly influence computed correlation to give false relationship (or lack thereof)
Spurious Correlation - appearance of linear relationship when data is correlated purely by chance e.g.stock prices vs. snow fall amounts
Nonlinear Relationships - does not capture strong nonlinear relationships

Question 7

Q

How does one test for significance of a correlation of the population, p (rho), of two variables (from the sample correlation results)?

Answer

A

LOS 9.c

Test whether the correlation between the population of the two variables is equal to zero using the following null and alternative hypotheses for two-tailed test with n-2 degrees of freedom (df):

H₀:p = 0 versus H_a:p != 0

test statistic t = r * sqrt(n-2) / sqrt(1 - r²)

Then compare computed t with the critical t-value for the appropriate degrees of freedom and level of significance. For a two-tailed test, the decision rule is stated as:

Reject H₀ if +t_critical < t or t < -t_critical

Question 8

Q

Distinguish between the dependent and independent variables of a linear regression

Answer

A

LOS 9.d

dependent variable - its variation is explained by the independent variable (e.g. “Y” values), aka explained, endogenous, or predicted variable
independent variable - explains the variation of the dependent variable (e.g. “X” values), aka explanatory, exogenous, or predicting variable.

Question 9

Q

Describe the six assumptions underlying linear regression

Answer

A

LOS 9.e

except for #1, it’s all about the residuals!

For X (independent) and Y (dependent) variables:

linear relationship exists between X and Y
X is uncorrelated with residuals, e
The expected value of the residual term is zero:
ê = 0, also noted as E(e) = 0
The variance of the residual term is constant for all observations: E(e_i²) = σ_e²
The residual term is independently distributed, i.e. residual for each observation is uncorrelated with all others: E(e_ie_j) = 0, j != i
The residual term is normally distributed

Question 10

Q

Interpret the linear regression coefficients

Answer

A

LOS 9.e

For linear relationship:

Y_i = b₀ + b₁X_i + e_i, i=1…n

the regression line equation is:

^Y_i= ^b₀ + ^b₁^X_i , i=1…n ( ^ equals “hat” or “estimated”)

^b₁ = cov_XY / s_X² ; “slope = cov / variance”; stock’s ß
^b₀ = Y_mean - ^b₁ X_mean ; y-intercept; stock’s alpha

Question 11

Q

Standard error or estimate (def.)

Answer

A

LOS 9.f

Standard error of estimate (SEE) is the standard deviation of the error terms in the regression.

also called:

standard error of the residual

standard error of the regression

SEE measures the degree of variability of the actual Y-values realtive to estimated Y-values from a regression equation.

The SEE gauges the “fit” of the regression line.

The smaller the the standard error, the better the fit.

Question 12

Q

Coefficient of Determination (def.) for simple linear regression

Answer

A

LOS 9.f

Coeffient of determination (R²) i sthe percentage of the toal variation in the dependent variable (Y) explained by the independent variable (X).

For simple linear regression (not for multi-variate regression),

R² = r², where

r = sample correlation coefficient

Question 13

Q

Regression coefficient (^b₁) confidence interval (equation)

Answer

A

LOS 9.f

^b₁ +/- (t_c x s_^b1), where

t_c = critical two-tailed t-value for the selected confidence level for df = n-2

Question 14

Q

Test for significance about a population value of a regression coefficient (e.g. b₁)

Answer

A

LOS 9.g

Use two-tailed t-test with df = n-2:

t_b1 = (^b₁ - b₁) / s_^b1, where

b₁ = the hypothesized value.

H₀: b₁ = 0; H_a: != 0

reject H₀ if t < -t_c or t_c < t, which means that b₁ is significantly different from the hypothesized value

Question 15

Q

For a simple linear regression, how does one predict the value of the dependent variable (Y)?

Answer

A

LOS 9.h

^Y = ^b₀ + ^b₁X_p, where

^Y = predicted value of the dependent variable

X_p = forecasted value of the dependent variable

Question 16

Q

what is the confidence interval for a predicted value of a dependent variable (^Y)?

Answer

Study These Flashcards

A

LOS 9.i

^Y +/- (t_c x s_f), where

t_c = teo-tailed critical t-value at desired signifiance level with df = n-2

s_f = standard error of the forecast (will likely be provided)

s_f² = SEE²(1 + 1/n + (X - X_bar)² / (n-1)s_x²), where

X = value of independent variable for which forecast was made

Question 17

Q

What does acronym ANOVA stand for and definition?

Answer

Study These Flashcards

A

ANOVA = “analysis of variance”

ANOVA is a statistical procedure for analyzing the total variability of the dependent variable.

Question 18

Q

Write out the ANOVA table for simple linear regression

Answer

Study These Flashcards

A

LOS 9.j

source of variation df Sum of Squares Mean Sum of Squares
Regression (explained) k =1 RSS MSR = RSS/k = RSS
Error (unexplained) (n-k-1) n = 2 SSE MSE = SSE / (n-2)
TOTAL (k+n-2) n - 1 SST = RSS + SSE

Question 19

Q

Calculate R² for simple linear regression from an ANOVA table

Answer

Study These Flashcards

A

LOS 9.j

R² = (SST - SSE) / SST = RSS / SST

Question 20

Q

Calculate SEE for simple linear regression from an ANOVA table

Answer

Study These Flashcards

A

LOS 9.j

SEE = sqrt(MSE) = sqrt(SSE / n-2))

Recall that SEE is the standard deviation of the regression error terms

Question 21

Q

Calculate the F-statistic of a simple linear regression

Answer

Study These Flashcards

A

LOS 9.j

F = MSR / MSE = (RSS/k) / (SSE/(n-k-1)), where

MSR = mean regression sum of squares

MSE = mean squared error

NOTE: This is always a 1-tailed test!

Question 22

Q

What is the purpose of the F-test?

Answer

Study These Flashcards

A

LOS 9.j

The F-test is used to tell whether at least one indpendent variable explains a significant portion of the dependent variable (i.e. does at least b_i explain the variation of Y?):

F = explained variance / unexplained variance
= MSR / MSE = RSS/k / SSE/(n-k-1)

NOTE: the F-statistic tests all dependent variables as a group.

NOTE: for a simple linear regression, the F-test tells us the same thing as a T-test. In fact, for a simple linear regression F = t_bi²

Question 23

Q

What are the limitiations of regession analysis?

Answer

Study These Flashcards

A

LOS 9.k

Parameter instability. Linear relationships can change over time i.e. the estimateation equation for one period of time may not be relevant for another time period.
Efficient markets theory. the usefulness of a regression model will be limited if other market participants also act on the evidence of the same model.
Regression analysis assumptions do not hold. There are 6 assumptions; For example, if the data is heteroskadastic (non-constant variance of error terms) or exhibits autocorrelation (error terms are not independent), regression results may be invalid.

Quant Methods #9 - Correlation and Regression Flashcards

(23 cards)