Quant Flashcards
Machine Learning
Gives a computer the ability to improve its performance of a task over time
t-test for r (correlation)
(n-2 df)
t = r*sqrt(n-2)/sqrt(1-r^2)
Estimated slope coefficient - least squares
cov (xy) / var (x)
Confidence interval for predicted Y-value
y = +- tc * SE of forecast
SST, MSR, MSE formulas
SST = RSS + SSE MSR = RSS/k MSE = SSE/(n-k-1)
Test statistical significant of regression
F = MSR / MSE with k and n-k-1 df (1-tail)
SSE
SSE = sqrt(MSE)
Smaller SSE = better fit
Coefficient of Determination
R^2 = RSS / SST
% variability of Y explained by Xs; higher R^2 = better fit
Heteroskedasticity
Non constant error variance
Detect with Breuch-Pagan test
Correct with White-corrected standard errors
Autocorrelation
Correlation among error terms
Detect with durbin-watson test -> positive autocorrelation if DW < dl
Correct by adjusting standard errors using Hansen method
Multicollinearity
High correlation between X’s
Detect if F-test significant, t-test insignificant
Correct by dropping X variables
Model Misspecification
Omitting a variable
Variable should be transformed
Incorrectly pooling data
Using lagged dependent variable as independent variable
Forecasting the past
Measuring independent variables with error
Effects of Misspecification
- Regression coefficients are biased and inconsistent
2. Lack of confidence in hypothesis tests of the coefficients or the model predictions
Supervised machine learning
inputs and outputs are identified
relationships modeled from labeled data
Unsupervised machine learning
Algorithm itself seeks to describe the structure of unlabeled data
Covariance stationary
mean and variance doesn’t change over time
To determine if a time series is covariance stationary,
1. plot data
2. run an AR model and test correlations
3. Perform Dickey-Fuller test
Unit root
coefficient on lagged dependent variable = 1. Series with unit root is not covariance stationary. First differencing will often eliminating the unit root
mean reverting level for AR(1)
b0/(1-b1)
RSME
square root of average squared error
Random walk time series
xt = x(t-1) + error(t)
Seasonality
indicated by statistically significant lagged error term. correct by adding lagged term
ARCH
detected by estimating:
e^2(t) = a0 + a1*e^2(t-1) + mean(t)
Variance of ARCH series
sigma^2(t+1) = ^a0 + ^a1*e(t)^2
Risk types
simulations
distribution of risk: continuous
sequential: does not matter
accommodates correlated variables? Yes
scenario analysis
distribution of risk: discrete
sequential: no
accommodates correlated variables? Yes
Decision Trees
distribution of risk: discrete
sequential: yes
accommodates correlated variables? no