Quantitative Methods Flashcards
T-Stat
F-Stat
T-Stat = Slope/Std error with n-k-1 degrees of freedom
F-Stat = MSR/MSE, n-k-1 df, one tailed test, reject if Fstat = Fcrit, how much does your regression represent your output vs your error representing your output
90% Significance = 1.645
95% Significance = 1.96
99% Significance = 2.58
ANOVA Table (RSS, SSE, SST, MSR, MSE, R^2, SEE)
Regression (RSS), k df, MSR = RSS/k
Error (SSE), n-k-1 df, MSE=SSE/n-k-1
RSS + SSE = SST
R^2 = RSS/SST, how good of a model is the regression vs the total Sum of Squares
Standard error of estimate (SEE) = sqrt(MSE), low if relationship is strong between X and Y
Linear
Log Linear
Auto Regressive (AR)
ARCH
Linear: y=mx+b
Log Linear: y=e^mx+b, ln(y)=mx+b
AR: X_t = mX_t-1 +b
ARCH: x_t ^2 = mX_t-1 ^2 +b
Steps for Time Series
1) Determine Linear/Log/AR
2) Check for autocorrelation via t-stat for AR, Durbin Watson for others
3) First difference if its there, replace x with y, y = x_t - x_t-1, changing the model to change in value of variable, this will remove a unit root and remove a trend in data
4) Correct for seasonality via adding lag variable when there is seasonality
5) Test variance for ARCH, if variance is dependent on other variance than use ARCH, make sure to correct for heteroskedasticity
Supervised Machine Learning (6)
1) Penalized Regression: Penalty for overfit, remove bad variables, reduces overfitting
2) Support Vector Machine: Data put in one of two buckets
3) K-Nearest Neighbor: Data classified by nearest neighbor
4) Classification Tree: Categorical Tree
5) Ensemble Learning: Multiple Dataset Model
6) Random Forest: Multiple Trees of the same set of data
Unsupervised Machine Learning (3)
1) Principal components analysis: Large correlated data -> Small uncorrelated data
2) K-Means clustering: Data divided into non-overlapping K clusters
3) Hierarchical clustering: Data put in hierarchy, no predefined clusters
Neural Networks
Neural Networks: Input/Layers/Output, layers have neurons which are either summation (average) or activation (non-linear)
Deep Learning: Many Neural Networks for more complex stuff like images
Reinforcement Learning: Learn from error to maximize defined reward
Precision, Recall, Accuracy, F-Score
Precision (P) = True Positives/(False Positives + True Positives), how many were actually right out of the amount you said were right
Recall (R) = True Positives/(True Positives + False Negatives), how many of the positives did you actually find
Accuracy = True Positive + True Negative/Total Data Points, how many were you correct on identifying
F1 Score = (2 * Precision * Recall) / (Precision + Recall)
Heteroskedasticity
It is when variance of errors is not constant, only a problem when there is conditional heteroskedasticity.
Causes Type I errors (rejected too often)
Use Breuch-Pagan or Chi-Square to detect, looking at R^2
Use White Correct standard errors to correct
Serial Correlation/Autocorrelation
Terms are not independent, they trend in some direction
Causes Type I errors (rejected too often)
Use Durbin Watson to detect, will be between -1 and 1 (full negative to full positive), Ho: No serial correlation
Use Hansen Method to adjust for serial correlation
Multicollinearity
Two variables are highly correlated so they are not independent
Solve this by removing one variable from the equation
Causes Type II errors (do not reject enough)
Unit Root
Detected by Dickey-Fuller or Engle-Granger test
Unit roots are bad if only one variable has one or if both variable have one and they are not cointegrated
Ok if no unit root or the variables are cointegrated