Study Session 3 - Correlation and Regression Flashcards

Question

What is mean squared error (MSE)?

Answer 1

MSR/MSE = RSS/k/SSE/n-k-1 explains how well, as a group the independent variables explain the variation of the dependent variables. always a one tailed test

Answer 2

=SST-SSE/SST =RSS/SST expressed as a percentage

Answer 3

SEE = √MSE = √SSE/n-2

Answer 4

Same as testing the slope coefficient being different from 0.

Answer 5

One tailed always if F calculated exceeds F critical, reject.

Answer 6

Linear relationships can change over time - referred to as parameter instability. Usefulness in an investment scenario depends on other people using the same approach.

Answer 7

The resulting formula will minimize the residuals squared.

Answer 8

The intercept is the value of the dependent variable when all independent variable are set to 0.

Answer 9

Represents how much the dependent variable will change if all other independent variables are held constant.

Answer 10

Tests whether each of the slope coefficients contribute significantly to explaining the variation of the dependent variable. t= estimated regression coefficient - hypothesized value/ coefficient standard error with, df =n-k-1

Answer 11

H0: b/y/x = 0 v. Ha: b/y/x =/= otherwise, hypothesized value instead of 0, incorporated into the calculation of T.

Answer 12

The smallest level of significance that the hypothesis can be rejected. p a, fail to reject

Answer 13

- A linear relationship between independent and dependent variables exists. - The expected value of the error term is 0 - The variance of the error terms is constant for all observations - The error term is normally distributed.

Answer 14

Test whether at least one of the independent variables contributed significantly to the variation of the dependent variable.

Answer 15

H0: b1 = b2 = b3, etc. = 0, Ha: at least one bj =/=0 F, is always one tailed. > than critical, reject.

Answer 16

In multiple regression, its the percentage of variation explained by the independent variables, collectively.

Answer 17

Because R2 almost always increases are variables are added to the model. Commonly known as overestimating the regression.

Answer 18

Adjusts R2 for the number of variables in the model. R²a= 1 - {(n-1)/n-k-1) * (1-R²)} 1 - df in F, times the remaining in R2.

Answer 19

When a coefficient is binary in nature, either on or off. Assigned a value of 1. Coefficient equals the change in dependent if present.

Answer 20

When we want to distinguish between n classes, we must use n-1 dummy variables. other no exact relationship is violated. Whatever class is omitted is usually the reference point/intercept for the model.

Answer 21

What is it? What is its effect on regression? How do we detect it? How do we correct it?

Answer 22

Assumption: variance of the residuals in constant across all observations. Violation: it is not the same across because there are subsamples that are more spread out. Unconditional: doesn't increase with the value of independent variable, not a problem for regression Conditional: increases with value of independent, causes issues.

Answer 23

1. Standard errors are unreliable 2. The coefficient estimates aren't affected. 3. If errors are too small, t will be too large, hypothesis too often rejected, opposite is true. 4. F test is unreliable.

Answer 24

Look at the plot is there a point where the distance of the error suddenly changes? More common: Breusch-Pagan/X² test chi-square test = R² x n, with k df one tailed test, use chart

Answer 25

By using robust standard errors - then used to recalculate t or generalized least squares, which attempts to eliminate the HSK by modifying the equation.

Answer 26

When the residuals are correlated. Positive SC: when a positive regression error in one time period increases the likelihood of a positive one in the next. Negative SC: when a positive regression error in one time period increases the likelihood of a negative one in the next.

Answer 27

Because the data clusters together, typically results in standard errors that are too small. Causes t to be too big, rejecting too often, too many Type I errors.

Answer 28

Durbin-Watson (DW) test. DW ~ 2(1-r), if the sample is large enough. r=correlation 0-Dlower, reject D lower - D upper, inconclusive D upper, do not reject H0

Answer 29

- Adjust the coefficient standard errors using the Hansen method, also corrects heteroskedasticity, then used in hypothesis testing. - Or improve the specification of the model, explicitly incorporate the nature of the model (seasonality, etc.). Hard.

Answer 30

t-test indicate none of the individual coefficient is different than zero F test is statically significant R² is high Means together the variables explain a lot but individually do not. Means the independents are highly correlated. Low correlation amongst Independent variables does not mean there isn't multicollinearity.

Answer 31

Most common is to omit one of the correlated variables. Hard to tell which one is to blame sometimes.

Answer 32

Functional Explanatory Other time series resulting in non stationary.

Answer 33

Important variables are omitted. Variables need to be transformed. - Using Ln instead of Linear or vice versa Data is improperly pooled - pooling data that should be kept separate

Answer 34

A lagged dependent variable is used as an independent variable. A function of the dependent variable is used as an independent variable. Independent variables are measured with error.

Answer 35

Using variables or data to forecast i.e. july, using data from july.

Answer 36

Using proxy variables. For ex. Corporate Governance Quality could be measured by Free float but it is not an actual measure, measuring it with error, ruining our regression.

Answer 37

A dummy variable with a value of 1 or 0 to predict the likelihood of a event happening or not.

Answer 38

Profit model is based on a normal distribution. | Logit model is based on the logistic distribution.

Answer 39

Make different assumptions regarding the independent variables. Results in linear function similar to an ordinary regression which generates a score to rank an observation. Ex. using financial ratios as the independent variable to predict the qualitative dependent variable of bankruptcy.

Answer 40

Is the model correctly specified? Correct if not. T -test individual coefficient to check for significance. F-test for model significance. Different model if not. Check for heteroskedasticity with Chi square test. Check for serial correlation with Durbin Watson Check for Multicollinearity. Fix if any. Use model.

Study Session 3 - Correlation and Regression Flashcards

(65 cards)