Quantitative Methods Flashcards
Formula for Multiple Regression
Coefficient of Determination
R2
Measure of Goodness of Fit
Sum of Squares Regression / Sum of Squares Total
Adjusted R2
Adjusts R2 by the degrees of freedom;
Does not automatically increase when variables are added
Akaike’s Information Criterion (AIC)
Measure of Model Parsimony ie. Lower is better fitting model
Preffered model for prediction purposes
Schwarz’s Bayesian Information Criteria (BIC or SBC)
Allows us to choose the best model among a set of models
Preffered when Goodness of Fit is Desired
Unrestricted Model
Full model with all independent variables
Restricted Model
Also called nested Models, they take the unrestricted model and exclude one or more variables
F-Distributed Test Statistic When Comparing Restricted & Unrestricted Models
q is the number of restrictions
General Linear F-test
Heteroskedasticity
The variance of the residuals differ across observations
Arises from Ommited Variables, Incorrect Functional Form, Extreme Values
Use Breusch-Pagan (BP) Test
Unconditional Heterskedasticity
Error variance is not correlated with Independent Variables
Not a problem for statistical inference
Conditional Heterskedasticity
Error Variance is correlated to independent variables
Inflated T-Statistics
Use Breusch-Pagan (BP) Test
Breusch-Pagan (BP) Test
Used to test for Heterskedasticity;
1. Run Regression
2. Run another regression with the Dependent variable being the residuals squared from step 1
3. Use Chi Squared Statistic to solve Null Hypothesis that there is no Heteroskedasticity
Robust Standard Errors
Computed to correct for the effects of Heteroskedasticity
Serial Correlation
Regression Errors are correlated across observations
Typically seen in Time-Series Regressions
Use Durbin Watson (DW) Test or Breusch-Godfrey (BG) Test
Breusch-Godfrey (BG) Test
Used to Test for Serial Correlation;
1. Run the initial regression
2. Run Fitted Residuals from Step 1 as the Dependent Variable against the initial regressors + one or more lagged residuals
3. Test Hypothesis using Chi-Square Test
Correcting for Serial Correlation
Serial-correlation consistent standard errors
Computed by Software Packages
Multicollinearity
Independent Variables are correlated to each other
Use variance inflation factor (VIF) to quantify multicollinearity issues
Variance Inflation Factor (VIF) Formula
Used to test for Multicollinearity
VIF>5 Prompts Investigation
VIF>10 Serious Multicollinearity issues
Correcting Multicollinearity
- Excluding 1 or more variables
- Using a different proxy for one of the variables
- Increasing sample size
No easy way to fix
High Leverage Point
Extreme Value of a Independent Variable
Outlier
Extreme Value of Dependent Variable
Leverage
Difference between nth independent variable and the mean of all the independent variables
Rule of Thumb: Leverage above 3(K+1/N) is potentially influential
Studentized Residual
Way of testing for outliers
Cook’s Distance
Metric for identifying influential data points; How the estimated value if the regression changes after deleting an observation
Logistic Transformation
Transforms Qualitative Dependent Variable into a Linear relationship with the independent variables
Logistic Regression
Likelihood Ratio (LR) Test
Method to assess the fit of Logistic Regression models
Higher values (closer to 0) are better
Linear Trend Model Formula
Time Series with Linear Trend
Log-Linear Model Formula
Commonly used with time series that have exponential growth