Quant Methods #9 - Correlation and Regression Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

covariance (def) of two random variables

A

LOS 9.a

a statistical measure of the degree to which two variables move together.

  • covariance captures the linear relationship between two variables
  • positive covariance : variables tend to move together
  • negative covariance: variables tend to move in opposite directions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

covXY =

A

LOS 9.a

covXY = E(i=1 to n) (Xi - Xmean)(Yi - Ymean) / (n-1)

where:

  • n = sample size
  • Xi = ith observation on variable X
  • Xmean = mean of X1 to n
  • <span>Y</span>i = ith observation on variable Y
  • <span>Y</span>mean = mean of Y1 to n
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is covariance not very meaningful but correlation coefficient is?

A

LOS 9.a

Covariance:

  • extremely sensitive to the scale of the two variables
  • range is -infinity to +infinity
  • presented in terms of squared units

Correlation coefficient converts covariance into a standardized measure that is easier to interpret.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sample correlation coefficent, rXY = ?

A

LOS 9.a

rXY = covXY / sX sY

  • sX = sample standard deviation of X
  • sY = sample standard deviation of Y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Interpret for rXY (sample correlation coefficient):

r = +1

0 < r < 1

r = 0

-1 < r < 0

r = -1

A

LOS 9.a

perfect positive linear correlation

positive linear relationship

no linear relationship

negative linear relationship

perfect negative linear correlation

Note that for r = 1 and r = -1 the data points lie exactly on a line, but the slope is not necessarily +1 or -1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the limitiations to correlation analysis?

A

LOS 9.b

  • Outliers - can significantly influence computed correlation to give false relationship (or lack thereof)
  • Spurious Correlation - appearance of linear relationship when data is correlated purely by chance e.g.stock prices vs. snow fall amounts
  • Nonlinear Relationships - does not capture strong nonlinear relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does one test for significance of a correlation of the population, p (rho), of two variables (from the sample correlation results)?

A

LOS 9.c

Test whether the correlation between the population of the two variables is equal to zero using the following null and alternative hypotheses for two-tailed test with n-2 degrees of freedom (df):

H0:p = 0 versus Ha:p != 0

test statistic t = r * sqrt(n-2) / sqrt(1 - r2)

Then compare computed t with the critical t-value for the appropriate degrees of freedom and level of significance. For a two-tailed test, the decision rule is stated as:

Reject H0 if +tcritical < t or t < -tcritical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Distinguish between the dependent and independent variables of a linear regression

A

LOS 9.d

  • dependent variable - its variation is explained by the independent variable (e.g. “Y” values), aka explained, endogenous, or predicted variable
  • independent variable - explains the variation of the dependent variable (e.g. “X” values), aka explanatory, exogenous, or predicting variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the six assumptions underlying linear regression

A

LOS 9.e

except for #1, it’s all about the residuals!

For X (independent) and Y (dependent) variables:

  1. linear relationship exists between X and Y
  2. X is uncorrelated with residuals, e
  3. The expected value of the residual term is zero:
    ê = 0, also noted as E(e) = 0
  4. The variance of the residual term is constant for all observations: E(ei2) = σe2
  5. The residual term is independently distributed, i.e. residual for each observation is uncorrelated with all others: E(eiej) = 0, j != i
  6. The residual term is normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interpret the linear regression coefficients

A

LOS 9.e

For linear relationship:

Yi = b0 + b1Xi + ei, i=1…n

the regression line equation is:

^Yi= ^b0 + ^b1^Xi , i=1…n ( ^ equals “hat” or “estimated”)

  • ^b1 = covXY / sX2 ; “slope = cov / variance”; stock’s ß
  • ^b0 = Ymean - ^b1 Xmean ; y-intercept; stock’s alpha
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standard error or estimate (def.)

A

LOS 9.f

Standard error of estimate (SEE) is the standard deviation of the error terms in the regression.

also called:

standard error of the residual

standard error of the regression

SEE measures the degree of variability of the actual Y-values realtive to estimated Y-values from a regression equation.

The SEE gauges the “fit” of the regression line.

The smaller the the standard error, the better the fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Coefficient of Determination (def.) for simple linear regression

A

LOS 9.f

Coeffient of determination (R2) i sthe percentage of the toal variation in the dependent variable (Y) explained by the independent variable (X).

For simple linear regression (not for multi-variate regression),

R2 = r2, where

r = sample correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Regression coefficient (^b1) confidence interval (equation)

A

LOS 9.f

^b1 +/- (tc x s^b1), where

tc = critical two-tailed t-value for the selected confidence level for df = n-2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Test for significance about a population value of a regression coefficient (e.g. b1)

A

LOS 9.g

Use two-tailed t-test with df = n-2:

tb1 = (^b1 - b1) / s^b1, where

b1 = the hypothesized value.

H0: b1 = 0; Ha: != 0

reject H0 if t < -tc or tc < t, which means that b1 is significantly different from the hypothesized value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For a simple linear regression, how does one predict the value of the dependent variable (Y)?

A

LOS 9.h

^Y = ^b0 + ^b1Xp, where

^Y = predicted value of the dependent variable

Xp = forecasted value of the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the confidence interval for a predicted value of a dependent variable (^Y)?

A

LOS 9.i

^Y +/- (tc x sf), where

tc = teo-tailed critical t-value at desired signifiance level with df = n-2

sf = standard error of the forecast (will likely be provided)

sf2 = SEE2(1 + 1/n + (X - Xbar)2 / (n-1)sx2), where

X = value of independent variable for which forecast was made

17
Q

What does acronym ANOVA stand for and definition?

A

ANOVA = “analysis of variance”

ANOVA is a statistical procedure for analyzing the total variability of the dependent variable.

18
Q

Write out the ANOVA table for simple linear regression

A

LOS 9.j

source of variation df Sum of Squares Mean Sum of Squares
Regression (explained) k =1 RSS MSR = RSS/k = RSS
Error (unexplained) (n-k-1) n = 2 SSE MSE = SSE / (n-2)
TOTAL (k+n-2) n - 1 SST = RSS + SSE

19
Q

Calculate R2 for simple linear regression from an ANOVA table

A

LOS 9.j

R2 = (SST - SSE) / SST = RSS / SST

20
Q

Calculate SEE for simple linear regression from an ANOVA table

A

LOS 9.j

SEE = sqrt(MSE) = sqrt(SSE / n-2))

Recall that SEE is the standard deviation of the regression error terms

21
Q

Calculate the F-statistic of a simple linear regression

A

LOS 9.j

F = MSR / MSE = (RSS/k) / (SSE/(n-k-1)), where

MSR = mean regression sum of squares

MSE = mean squared error

NOTE: This is always a 1-tailed test!

22
Q

What is the purpose of the F-test?

A

LOS 9.j

The F-test is used to tell whether at least one indpendent variable explains a significant portion of the dependent variable (i.e. does at least bi explain the variation of Y?):

F = explained variance / unexplained variance
= MSR / MSE = RSS/k / SSE/(n-k-1)

NOTE: the F-statistic tests all dependent variables as a group.

NOTE: for a simple linear regression, the F-test tells us the same thing as a T-test. In fact, for a simple linear regression F = tbi2

23
Q

What are the limitiations of regession analysis?

A

LOS 9.k

  1. Parameter instability. Linear relationships can change over time i.e. the estimateation equation for one period of time may not be relevant for another time period.
  2. Efficient markets theory. the usefulness of a regression model will be limited if other market participants also act on the evidence of the same model.
  3. Regression analysis assumptions do not hold. There are 6 assumptions; For example, if the data is heteroskadastic (non-constant variance of error terms) or exhibits autocorrelation (error terms are not independent), regression results may be invalid.