Multiple Regression Flashcards

1
Q

We can use multiple regression models to:

A

1 - Identify relationships between variables
2 - Forecast variables
3 - Test existing theories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The general multiple linear regression model is:

A

Yi = b0 + b1X1i + b2X2i + … + bkXki + εi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The residual, ε is

A

the difference between the observed value, Yi, and the predicted value from the regression, Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The p-value is

A

the smallest level of significance for which the null hypothesis can be rejected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If the p-value is less than the significance level

A

the null hypothesis can be rejected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If the p-value is greater than the significance level

A

the null hypothesis cannot be rejected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

intercept term

A

is the value of the dependent variable when the independent variables are all equal to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Assumptions underlying a multiple regression model include

A
  • linear relationship exists between x + y.
  • residuals are normally distributed.
  • variance of the error terms is constant.
  • residual are not correlated.
  • variables are not random, and no exact linear relation between variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

R2

A

evaluates the overall effectiveness of the entire set of independent variables in explaining the dependent variable

R2 = total variation-unexplained variation / total variation

R2 = SST-SSE / SST

R2 = explained variation / total variation

R2 = RSS / SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Adjusted R2

A

R2a =1- [(n-1/n-k-1)×(1-R2)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

AIC

A

better forecast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

BIC

A

goodness of fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

nested models

A

one model, called the full model or unrestricted model, has a higher number of independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Restricted model

A

subset of the independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

F-statistic

A

F= (SSER-SSEU)/q / (SSEU)/(n-k-1)

=(RSSU)/k / (SSEU)/(n-k-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

reject H0 if

A

F (test-statistic) > Fc (critical value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

F-test evaluates whether

A

the relative decrease in SSE due to the inclusion of q additional variables is statistically justified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Regression model specification

A

selection of the explanatory (independent) variables to be included in a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Examples of Misspecification of Functional Form

A

Misspecification #1: Omitting a Variable

Misspecification #2: Variable Should Be Transformed

Misspecification #3: Inappropriate Scaling of the Variable

Misspecification #4: Incorrectly Pooling Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Omission of important independent variable(s) effect

A

Biased and inconsistent regression parameters

serial correlation or heteroskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Inappropriate variable form effect

A

heteroskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Inappropriate variable scaling effect

A

heteroskedasticity or multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Data improperly pooled effect

A

heteroskedasticity or serial correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Omission of important independent variable(s)

A

one or more variables that should have been included are omitted.

25
Q

Inappropriate variable form

A

The relationship between the dependent and independent variables may be non-linear.

26
Q

Inappropriate variable scaling

A

Variables may need to be transformed

27
Q

Data improperly pooled

A

Sample has periods of dissimilar economic environments

28
Q

Heteroskedasticity

A

variance of the residuals is not the same across all observations

29
Q

Unconditional heteroskedasticity

A

heteroskedasticity is not related to the level of the independent variables

30
Q

Conditional heteroskedasticity

A

related to (i.e., conditional on) the level of the independent variables

31
Q

Effect of Heteroskedasticity on Regression Analysis

A

1- standard errors are unreliable estimates. (Type I errors)
2- F-test is unreliable.

32
Q

Detecting Heteroskedasticity

A

scatter plots
Breusch Pagan chi-square (χ2) test

33
Q

BP chi-square test statistic=

A

n × R2resid with k degrees of freedom

n=the number of observations

k=the number of independent variables

34
Q

To correct for conditional heteroskedasticity

A

robust standard errors, used to recalculate the t-statistics

35
Q

Serialcorrelation / auto correlation

A

regression residual terms are correlated with one another

36
Q

Effect of Serial Correlation

A

Consider a model that employs a lagged value of the dependent variable as one of the independent variables. Residual autocorrelation in such a model causes the estimates of the slope coefficients to be inconsistent. If the model does not have any lagged dependent variables, then the estimates of the slope coefficient will be consistent.

37
Q

Effect on Standard Errors (Serial Correlation)

A

Positive serial correlation results in coefficient standard errors that are too small, causing t-statistics (and F-statistic) to be larger than they should be, leads to Type I errors.

38
Q

Detecting Serial Correlation

A

Durbin–Watson (DW) statistic
Breusch–Godfrey (BG) test

39
Q

Breusch–Godfrey (BG) test

A

regresses the regression residuals against the original set of independent variables, plus one or more additional variables representing lagged residual(s)

40
Q

Correction for Serial Correlation

A

robust standard errors used to recalculate the t-statistics using the original regression coefficients.

41
Q

Multicollinearity

A

two or more of the independent variables in a multiple regression are highly correlated with each other.

42
Q

Effect of Serial Correlation on Model Parameters

A

slope coefficients are imprecise and unreliable

inflates standard errors and lowers t-stats.

43
Q

Effect on Standard Errors (Multicollinearity)

A

Standard errors are too high. This leads to Type II errors.

44
Q

Detection of Multicollinearity

A

variance inflation factor (VIF)

45
Q

variance inflation factor (VIF) formula

A

VIFj = 1 / (1 – R2j)

46
Q

VIF values

A

greater than 5 (i.e., R2 > 80%) warrant further investigation,
above 10 (i.e., R2 > 90%) indicates severe multicollinearity.

47
Q

Correction for Multicollinearity

A

omit one or more of the correlated independent variables
increase sample size

48
Q

We can identify outliers using

A

studentized residuals

49
Q

studentized residuals steps

A

extreme observations that, when excluded, cause a significant change to model coefficients.

50
Q

Detecting Influential Data Points

A

Cook’s distance

51
Q

Cook’s distance

A

composite metric (i.e., it takes into account both the leverage and outliers) for evaluating if a specific observation is influential.

52
Q

Cooks Distance Formula

A

D = e2/k×MSE x [h(1-h2)]

k= number of independent variables

MSE = mean square error of the regression model

h= leverage value

53
Q

Cooks Distance (Di values)

A

greater than √k/n indicate influential data point

54
Q

Logistic regression (logit) models

A

ln(p1−p) = b0+b1X1+b2X2+…+ε

55
Q

likelihood ratio (LR)

A

LR = –2 (log likelihood restricted model – log likelihood unrestricted model)

56
Q

likelihood ratio (LR) values

A

Negative . (closer to 0) indicate a better-fitting restricted model.

57
Q

high-leverage points

A

extreme observations of the independent or ‘X’ variables

58
Q

Influential data points

A

extreme observations that when excluded cause a significant change in model coefficients

59
Q

Qualitative independent variables (dummy variables)

A

effect of a binary independent variable