Quant Flashcards
Machine Learning
Gives a computer the ability to improve its performance of a task over time
t-test for r (correlation)
(n-2 df)
t = r*sqrt(n-2)/sqrt(1-r^2)
Estimated slope coefficient - least squares
cov (xy) / var (x)
Confidence interval for predicted Y-value
y = +- tc * SE of forecast
SST, MSR, MSE formulas
SST = RSS + SSE MSR = RSS/k MSE = SSE/(n-k-1)
Test statistical significant of regression
F = MSR / MSE with k and n-k-1 df (1-tail)
SSE
SSE = sqrt(MSE)
Smaller SSE = better fit
Coefficient of Determination
R^2 = RSS / SST
% variability of Y explained by Xs; higher R^2 = better fit
Heteroskedasticity
Non constant error variance
Detect with Breuch-Pagan test
Correct with White-corrected standard errors
Autocorrelation
Correlation among error terms
Detect with durbin-watson test -> positive autocorrelation if DW < dl
Correct by adjusting standard errors using Hansen method
Multicollinearity
High correlation between X’s
Detect if F-test significant, t-test insignificant
Correct by dropping X variables
Model Misspecification
Omitting a variable
Variable should be transformed
Incorrectly pooling data
Using lagged dependent variable as independent variable
Forecasting the past
Measuring independent variables with error
Effects of Misspecification
- Regression coefficients are biased and inconsistent
2. Lack of confidence in hypothesis tests of the coefficients or the model predictions
Supervised machine learning
inputs and outputs are identified
relationships modeled from labeled data
Unsupervised machine learning
Algorithm itself seeks to describe the structure of unlabeled data