Quantitative Methods Flashcards
When should you use Logistic regression models?
If the dependent Y variable is discrete
If out independent X variables is qualitative
When should you use Multiple regression models?
When the dependent variable is continuous (not discrete) and there is more than one explanatory variable (more than one dependent variable).
When multiple independent variables determine the outcome of a single dependent variable.
- Dependent Y Variable is continuous
- We have more than 1 Dependent Y variable
Assumption of Regression models
L.I.I.N.H.
Linearity: Relationship between dependent Y variable and Independent X variable is linear.
Independent of Errors: Regression residuals are uncorrelated across observation.
Independent: Independent X variable is not random, there is no exact linear relationship between 2 or more independent variables.
Normality: Regression residuals are normally distributed.
Homoscedasticity: Constant variance of regression residuals
How to determine if a variable is significant?
|T-Stat| > 1
Degrees of freedom for SSR
N-k
Degrees of freedom for SST
N-1
Degrees of freedom for SSE
N-K+1
What will happen to adjusted R-Square if we have insignificant varibles
Adjusted R-Square decreases
R-Square formula
SSR/SST = Explained Variation / Unexplained variation
1-(unexplained variation/total variation)
What kind of test is this?
H0: bi = Bi
Ha: bi /= Bi
Two tail test
What kind of test is this?
H0: bi <= Bi
Ha: bi > Bi
Right tail test
<= - is heading right
What kind of test is this?
H0: bi => Bi
Ha: bi < Bi
Left tail test
=> is heading left
Model Misspecification - Omitted variable
If we omit a significant variable from our model, the error term will capture the missing.
Model Misspecification - Inappropriate form of variable
Failing to account for non-linearity
Causes: Conditional heteroscedasticity
To fix it we can use natural log to transform the variable to be linear.
Model Misspecification - Inappropriate Scaling
Causes Conditional heteroscedasticity and multicollinearity
Model Misspecification - Inappropriate Pooling of Data
Causes Conditional heteroscedasticity and Serial correlation
What is Unconditional heteroscedasticity
Var(error) not correlated with independent variable.
No issue with interference.
What is Conditional heteroscedasticity
Var(error) are correlated with independent X variable
F-test is unreliable since MSE is a biased estimator of the true population variance.
variance at one time step has a positive relationship with variance at one or more previous time steps. This implies that periods of high variability will tend to follow periods of high variability and periods of low variability will tend to follow periods of low variability.
What does the Breusch Pagan BP tets do?
Tests for heteroskedasticity
The formula for BP test statistics
n * R-Square
BP test
Test statistics > Critical value
Reject the null.
No heteroskedasticity
homoskedasticity is present -* Constant vartiance *
- H0: No heteroskedasticity - homoskedasticity is present
- Ha: Heteroskedasticity
BP test
Test statistics < Critical value
Reject the null
There is Heteroskedasticity
H0: No heteroskedasticity
Ha: Heteroskedasticity
What is serial correlation?
Errors correlated across the observation
Positive Serial Correlation
Positive residuals is most likely followed by positive residuals
Negative residuals is most likely followed by negative residuals
Negative Serial Correlation
Negative residual is most likely followed by positive residual
Positive residual is most likely followed by negative residual
Multicollinearity
2 or more independent variables are highly correlated or there is an approximate linear relationship among the IVs.
Coefficients will be consistent but imprecise and unreliable
Inflated SE and insignificant T-Statistics, but possibly significant F-Statistics
How to detect multicollinearity?
Variance inflation factor
1 / (1- R Square)
We want VIF as low as possible
> 5 Concerning
10 Multicollinearity
How to fix multicollinearity?
- Increase sample size
- Excluding one or more of the regression variables.
- Use a different proxy for one of the variables
Formula and purpose of AIC
AIC = n * ln(SSE/n)+ 2(K+1)
AIC is better for forecasting purposes
Formula and purpose of BIC
BIC = n * ln(SSE/n) + Ln(n)(k+1)
Better for evaluating goodness-of-fit
How do we test joint coefficients?
F-Stat
[(SSE restricted - SSE unrestricted) / q] / (SSE unrestricted / N-k-1)
What is a High leverage point?
Extreme value of independent variables
Observation that is outside the range of independent variables (x axis)
What is a Outliers?
Extreme value in the dependent variable
Observation that is outside the range of the dependent variables (vertical Y range)
How do you detect and calculate a High leverage point?
Calculate leverage measure
HL = 3 (K+1/n)
1/n + ( Deviation of i / Sum of all deviations)
How do you detect and calculate a outlier?
**Externally studentized residuals
- Delete each case i
- Calculate new regression
- Add deleted observation back in, calculate residual
- Calculate sudentized residuals
T* = e* / se*
potentially influentia if ..
|T|> Critical t (for small samples)
|T| < 3 for large samples
How can we determine and find influential outliers
By calculating Cooks distance (aka Cooks D)
If cooks D is …
Di > 0.5
could be influential
If cooks D is
Di > 1
Likely to be influential
If cooks D is
Di > 2 x Rot(K/n)
Influential
How does an intercept dummy variable look like?
No interaction term
yi = b0 + b1x1 +b2x2 + d0D1
How does an Slope dummy variable look like?
interaction term
yi = b0 + b1x1 +d1x1D + epsilon
How do you interpret an independent variable’s slope coefficient in a logistic regression model
log odds that the event happens per unit change in the independent variable, holding all other independent variables constant.
The intercept in these logistic regressions is interpreted as the:
log odds of the ETF being a winning fund if all independent variables are zero.
When to use a Log-Linear trend model?
When the dependent Y variable changes at a constant growth rate
When to use a Linear trend model?
When the dependent Y variable changes at a constant rate with time.
DW test for Serial correlation in linear/log-linear model hypothesis
H0: Dw = 2 - Fail to reject - Do not reject the null hypothesis - No Serial correlation
Ha: Dw =/2 -Reject null - We have serial correlation
Autoregressive AR model
A time series regressed on its own past values.
A statistical model is autoregressive if it predicts future values based on past values. For example, an autoregressive model might seek to predict a stock’s future prices based on its past performance.
What are the 3 properties we must satisfy to have “Covariance Stationary Series”
Mean, Variance, and Cov(yt, yt-s) must be constant and finite in all periods.
- The expected value of the time series must be constant and finite in all periods.
- The variance of the time series must be constant and finite in all periods.
- The covariance of the time series with itself for a fixed number of periods in the past or future must be constant and finite in all period
What is “mean Reversion”
The value of the time series falls when it’s above its mean, and rises when it’s below its mean.
Mean reversion in finance suggests that various relevant phenomena such as asset prices and volatility of returns eventually revert to their long-term average levels.
The mean reversion theory has led to many investment strategies, from stock trading techniques to options pricing models.
Mean reversion trading tries to capitalize on extreme changes in the price of a particular security, assuming that it will revert to its previous state
Define the Mean reverting level …
Xt > b0/(1-b1)
The time series will decrease
Define the Mean reverting level …
Xt = b0/(1-b1)
The time series will remain the same
Define the Mean reverting level …
Xt < b0/(1-b1)
The time series will remain the increase
What is an “in-sample forecast”
Prediction
Predicted vs Observed values to generate the model
Models with a smaller variance of errors are more accurate
What is an “out-of-sample forecast”
Forecast
Forecast vs Outside the model’s values
Use Root Mean Squared Errors (RMSE) - used to compute out-of-sample forecasting performance. The smaller the RMSE, the better.
What 2 elements does Random Walk not have?
Finite mean reverting level, and finite variance