Quant Methods #10 - Multiple Regression & Issues in Regression Analysis Flashcards

Question 1

Q

variance and standard deviation equations and relationship to each other

Answer

A

LOS 10.a

Variance: σ_X² =E_{(i=1 to n)} (X_i - X_mean) / (n-1)

standard deviation is the square root of variance:

σ_X = sqrt(σ_X²)

Question 2

Q

Fill in the terms to this ANOVA table:

Source df SS MS

Regression ? ? ?

Error ? ? ?

Total ? ?

Answer

A

LOS 10.i

Source df SS MS

Regression k RSS MSR

Error n-k-1 SSE MSE

Total n-1 SST

NOTE: MSR = RSS / k; MSE = SSE / (n-k-1); R² = RSS / SST; SEE = sqrt(MSE) ≈ s_forecast for large n

Question 3

Q

Construct equations for MSE, MSR, R², F, and SEE to show their relationship with terms in the ANOVA table:

Source df SS MS

Regression k RSS MSR

Error n-k-1 SSE MSE

Total n-1 SST

Answer

A

LOS 10.i

mean squared error: MSE = SSE / (n-k-1)

mean regression sum of squares: MSR = RSS / k

coefficient of determinantion: R² = RSS / SST

F-statistic: F = MSR / MSE

standard error of estimate: SEE = sqrt(MSE)

standard error of forecast (large n): s_forecast ≈ SEE

Question 4

Q

Compute the residual ^e “e-hat” for observation “i” from the observation data and the multi-variate regression

Answer

A

LOS 10.a

^e_i = Y_i - ^Y_i = Y_i - (^b₀ + ^b₁X₁_i + ^b₂X₂₁ + … + ^b_kX_ki)

Question 5

Q

T-statistic used for testing regression coefficients for statistical significance

Answer

A

LOS 10.c, LOS 10.d

t = (^b_j - b_j) / s_^b,j

df = n - k - 1

where:

^b_j = coefficient to be tested
b_j = significance value to be tested (= 0)
s_^b,j = estimated standard error for b_j
n = number of observations
k = number of independent variables

Question 6

Q

Interpret estimated regression coefficients

Answer

A

LOS 10.b

intercept term - value of dependent variable when all independent variables are zero.

(partial) slope coefficients - estimated change in the dependent variable for a one-unit change in that independent variable, holding all other independent variables constant.

Question 7

Q

Interpret the p-value of an estimated regression coefficient

Answer

A

LOS 10.b

The p-value is the smallest level of significancefor for which the null hypothesis can be rejected.

Comparing p-value to the significance level:

If p-value < significance level, H₀ can be rejected
If p-value > significance level, H₀ cannot be rejected

Example: if ^b₁ = 0.40 and its p-value = 0.032, at 1% significance level:

p (0.032) > 0.01, so we cannot reject H₀, so ^b₁ is not statistically signficant from 0 at 1% level of significance.
However, we can conclude that ^b₁ is statistically signficant from 0 at any significance level greater than 3.2%.

Question 8

Q

heteroskedasticity

Answer

A

LOS 10.k

arises when residual variance is non-constant
2 types of heteroskedasticity:
- Type 1: “unconditional”
  - residuals not related to X’s
  - type 1 causes no major problems
- Type 2: “conditional”
  - residual are related to X’s
  - type 2 is a problem!
Impact / effect:
- std errors (s_b’s) unreliable estimates
- coefifficient estimates (b’s) are not affected
- t-stats are too high (s_b’s too small)
- F-test unreliable

Question 9

Q

detecting heteroskedasticity

Answer

A

LOS 10.k

scatter diagrams: plot residuals vs each X & time
Breusch-Pagan test: regress squared residuals on “X” variables to test significance of R_resid²
- H₀: no heteroskedasticity
- Chi-square test: BP = R_resid² * n (w/ df = k)

Question 10

Q

correcting heteroskedasticity

Answer

A

LOS 10.k

1st Method: White-corrected (“robust”) std errors; makes std errors higher, t-stats lower, and conclusions more accurate
2nd Method: use “generalized least squares” - modifying original equation to eliminate heteroskedasticity

Question 11

Q

serial correlation

Answer

A

LOS 10.k

positive autocorrelation: each residual trends in same direction as previous term; common in financial data
impact: t-stats too high

Question 12

Q

detecting serial correlation

Answer

A

LOS 10.k

scatter plot: visually inspect error terms
Durbin-Watson statistic
- formal test of error term correlation
- for large samples: DW ≈ 2(1 - r), where r = correlation of residuals from one observation to the next

Question 13

Q

interpreting Durbin-Watson values

Answer

A

LOS 10.k

for DW ≈ 2(1-r):
- no autocorrelation (rho = 0): DW =2
- positive autocorr. (rho=1): DW = 0 (common)
- negative a.c. (rho=-1): DW=4 (uncommon)
How close to “2” does DW have to be to conclude “no autocorrelation”? Look at ranges in DW tables
- table gives critical values “d_l” and “d_u”
- H₀ = no positive serial correlation
- 0 l: reject H₀, is + autocorrelated
- d_l u: inconclusive
- d_u 0

Question 14

Q

correcting serial correlation

Answer

A

LOS 10k

preferred method: “Hansen Method”

adjust standard errors upwards and then recalculate t-stats
also corrects for conditional heteroskedasticity
result: t-stats decline, chance of Type I error (false positive) declines

Question 15

Q

multicolinearity

Answer

A

LOS 10.l

multicolinearity - two or more “X’s” are correlated to each other

effects: inflates std errors; reduces t-stats, increases chance for Type II errors (false negative)
i.e. t-stats look artificifially small, so variables look unimportant

Question 16

Q

detecting/correcting multicolinearity

Answer

A

LOS 10.l

Tell-tale signs from regression data:

significant F-stat (and high R²) but all t-stats insignificant
high correlation between “X” variables (for k=2 case only)
sign of coefficient is unexpected

correction: omit one or more X variables

Question 17

Q

summary of regression analysis problems

Answer

A

LOS 10.k,l

Conditional Heteroskedasticity

define: non-constant residual variance
effect: Type I errors (false positive)
detect: Breusch-Pagan; chi-square test
correct: White-corrected standard errors

Serial Correlation

define: residuals are correlated
effect: Type I errors (false positive)
detect: Durbin-Watson test
correct: Hansen method to adjust standard errors

Multicolinearity

define: two or more X’s are correlated
effect: Type II errors (false negative)
detect: conflicting t and F stats; correlation among X’s (for k=2)
correct: drop one of the correlated X’s

Question 18

Q

regression model misspecification

Answer

A

LOS 10.m

model specification: process of variable selection and transformation; determines/affects quality of regression

Effect of misspecification:
- regression coef’s will be biased and inconsistent
- lack of confidence in hypothesis tests of coef’s or in model predicitions

Types of Model Misspecification:

omitting an important variable
variables not transformed appropriately
incorrect pooling of data
using lagged dep. var. as indep. var.
forecasting the past
inaccurate measurement of indep. var. data

Question 19

Q

multiple regression model flow chart