Quant Flashcards
Multicollinearity
Multicollinearity refers to the condition when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other. This condition distorts the standard error of estimate and the coefficient standard errors, leading to problems when conducting t-tests for statistical significance of parameters.
Effect: greater probability that we will incorrectly conclude that a variable is not statistically significant (type II error)
Detecting: if the absolute value of the sample correlation between any two independent variables in the regression is greater than 0.7, multicollinerity is a potential problem.
The classic symptom of multicollinearity is a high R^2 and significant F-statistic even though the T statistics on the estimated slope coefficients are not significant
Correcting: omit one or more of the correlated independent variables. Not always easy to detect which variables
what does an F test - test
An F-test tests whether at least one of the independent variables is significantly different from zero, where the null hypothesis is that all none of the independent variables are significant.
Coefficent of determination
% of variability of Y explained by Xs. Higher R^2 means better fit.
The R^2 is calculated as (SST - SSE) / SST.
or
RSS/SST
Standard Error of estimate
SEE= square root of MSE
heteroskedasticity
What is it: Occurs when the variance of the residuals is not the same across all observations in the sample.
Effect on regression analysis. 4:
Standard errors are usually unreliable
The coefficient estimates arent affected
F test unreliable
if standard error too small -null hypothesis rejected too often.
If standard error too big - rejected not often enough
Detect it: Looking at scatter plots of the residual and using the Breusch- Pagan chi-square test. Also a scatter plot of the residuals versus one or more of the independent variables can reveal patterns among observations.
Correct: Calculate robust standard errors - (White corrected standard or heteroskedasticity-consistent standard errors).
Or use generalized least squares.
Two advantages of using simulation in decision making
The Two advantages of using simulation in decision making are 1) Better input estimation and 2) Simulation yields a distribution for expected value rather than a point estimate. Simulations do not 1) yield better estimates of expected value than conventional risk adjusted value models, nor 2) lead to better decisions.
The general format for a confidence interval is:
estimated coefficient ± (critical t-stat x coefficient standard error)
What does standard error of the estimate measure
The standard error of the estimate measures the uncertainty in the relationship between the actual and predicted values of the dependent variable. The differences between these values are called the residuals, and the standard error of the estimate helps gauge the fit of the regression line (the smaller the standard error of the estimate, the better the fit).
The root mean squared error (RMSE) criterion is used to compare the accuracy of autoregressive models in forecasting out-of-sample values. To determine which model will more accurately forecast future values, we calculate the square root of the mean squared error. The model with the smallest RMSE is the preferred model.
true
Spurious correlation
the appearance of a casual linear relationship when in fact there is no relation.
limitation of correlation analysis
does not capture strong nonlinear relationships between variables
slope coefficent
the estimated slope coefficent b1 for the regression line descrives the change in Y for a one unit change in X.
the predicted change in the dependent variable for 1-unit of change in the independent variable.
b1= cov(xy)/ sigma(x) ^2
The coefficient of determination
In a simple regression, the coefficient of determination is calculated as the correlation coefficient squared and ranges from 0 to +1.
cannot decrease as independent variables are added to the model.
is the percentage of the total variation in the dependent variable that is explained by the independent variable.
F - Statistic
F = MSR/MSE = (RSS/K) / (SSE/ [n - k - 1])
SST
RSS
SSE
Total Sum of Squares - measures total variation in the dependent variable.
Regression sum of Squares - measures the variation in the dependent variable that is explained by the independent variable.
Sum of squared errors - measures the unexplained variation in the dependent variable.