3. Quantitative Methods Flashcards
Sum of Squared Errors (Definition)
Difference between Yi and Ŷ (observation and estimate)
Sum of Squared Regression (Definition)
Difference between Ŷ and Mean of Y (regression and best descriptive estimator)
SSR (#Degrees of Freedom)
k (# parameters of X estimated in the regression)
SSE (#Degrees of Freedom)
n-k-1 (N - estimators of X - intercept)
SST (#Degrees of Freedom)
(n-1)
Mean Squared of Regression (Formula)
MSR = SSR/k
Mean Squared of Error (Formula)
MSE = SSE/(n-k-1)
Squared Error of Estimate (SEE Formula)
SSE = √MSE
The lower, the more accurate the model is
F Test
F = MSR / MSE (testar a diferença entre a regressão em comparação com o erro)
DF @ K numerator (horizontal)
DF @ N-K-1 denominator (vertical)
Regression Assumptions
- Linearity
- Homoscedasticity (var ε same across observations). Muita ou pouca VOL.
- Pairs X and Y are independent (if not, there is serial correlation)
- a. Residuals are independently distributed
- b. Residuals’ distribution is Normal
B0 (Intercept Test)
T-Test = (B1 est - B1 hipótese) / SB1
One-tail or Two-tails @ df = (n-k-1), as I am using error as a denominator
Sb1 = SEE / Sum of Sqaures of (Obs X - Mean X)
Dummy Variable
Y = b0 + b1*Dummy
Dummy = 0 or 1
If Dummy = 0, then Y = b0 = mean
If Dummy = 1, then Y = b0 + b1
Confidence Interval (Formula)
Interval = Ŷ ± T-Critical * Sf
Ŷ = Calculate using regression Sf = Std Error of Forecast = Sf = SEE² * [1 + 1/n * [(X-Mean)²/(n-1*Sx²)]
R² (Formula)
R² = SSR/SST = Measure of Fit
Regression Types
- Log-Lin = lnY = b0+b1X1
- Log-Log = lnY = b0 + b1*(lnX1)
- Lin-Log = Y = b0 + b1*(lnX1)
Multiple Regression Assumptions
- X e Y are linear
- IVs (X) are not random
- E (ε | X1, X2, Xk) = 0
- E (ε²) = Variância e é igual para todas as observações
- E (erro1, erro2) = 0, erro não é correlacionado
- Erro é distr. ~N
F-statistic for Multiple (Hypothesis)
H0: B1 = B2 = B3 = 0
H1: At least one ≠ 0
One-Tailed Test @
DF Numerator = K = Horizontal
DF Denominator = (N-K-1) = Vertical
R² Adjusted (Formula)
Adj. R² = 1 - [(n-1)/(n-k-1)] * [1-R²]
Multicolinearity (Definition)
B1 e B2 t-tests are not relevant, but F-test is
Reason: Two IVs are highly correlated
Detection: ↑ R² and ↑ F-test, but ↓ B0
Correction: Omit one variable
Consequence: ↑ SE = ↓ F test
Heteroskedasticity (Definition)
Var of ε changes across observations
Unconditional: Var (ε) NOT correlated w/ IVs
Conditional: Var (ε) IS correlated w/ IVs
Correction:
- Robust Std Errors
- Generalized Least Squares
Heteroskedasticity (Test)
Breusch Pagan Test (OH NO)
H0: NO conditional
H1: Conditional
Test = n * R²*ε @ Chi Squared Table
Regress the error on the IVs
Hansen Method (Definition)
Preferred if (i) SC or (ii) SC + Heteroskedasticity
Serial Correlation (Definition)
- Errors are explained by similar reasons
- ↓ SEE = ↑ F-test
- Violates Independence of Pairs (X and Y)
- Se o erro anterior é positivo, chance do erro seguinte ser positivo é de fato mais alta
- If IV = Y lagged, then B0 will not be valid
Test for Serial Correlation
Durbin Watson (Deutsche Welle)
H0: DW = 2 (No Correl)
H1: DW ≠ 2 (Correl)
Test = 2*(1-r) DF = K and N items
Correction: (i) Modified SEs,
(ii) Modify Regression Equation
(iii) Include seasonal term
Hansen or White Method (Criteria)
If only Hetero: White SEs
If only SC: Hansen
If both: Hansen is preferred
Standard Error of Residuals (Formula)
SE Residuals = 1 /√T, where T = # observations
Misspecifications of Model (List)
- Data Mining
- Functional Form (Linear, Log, Diff Samples)
- Parsimonious IVs
- Examine violations before accepting
- Tested out of sample
Logit Regressions
Ln (Odds) = B0 + B1X1 + BnXn + ε
Estima a máxima chance de o sample ter acontecido
Slope = Chg Log Odds of event happening
Odds (Formula)
Odds = p % / (1 - p)
ln (p/(1-p)) = b0+b1X1 + ε
p = (e^A) / (1 + e^A), onde A = equação
Time Series Analysis
Yt = B0 + B1t + εt, where IV is Time
Often have Serial Correlation, so test for Durbin Watson
AR Model (Concept)
Xt = B0 + B1Xt-1 + ε (AR1)
- Regress X in its past values
- Se B1 < 1, modelo é mean reverting
- Não pode usar DW
- Tem que usar t = Autocorrel / (1 /√T)
AR (2) Model Structure
AR(2) Model Y(t) = b0 + b1* y(t-1) + b2*y(t-2)
Covariance Stationary Model
(a) Mean = fixa
(b) Variance = constante
(c) Cov(Yt,Yt-s)
Todos constantes e finitos
Mean Reverting Level (Formula)
Xt = b0 / (1 - b1)
Compare 2 models (in terms of forecasting power)
- In-Sample forecasts: Predicted v. Observed
- Out of Sample:
Use RMSE = √(Soma Act - Forecast)²/ n
RMSE = menor melhor
Random Walk (Definition)
- X is explained by a sum of errors
- AR model where B0 = 0 and B1 = 1, then not mean reverting level
- No Mean Reverting
- No Cte Variance
Random Walk w/ Drift (Definition)
- Intercept is not ZERO (B0 ≠ 0)
- B1 is still 1
Seasonality (Definition)
- One of autocorrelation tests for IVs in AR model will be very significative
- Correction: include a seasonal lag
ARCH Model (Definition)
- It means testing to check if an AR model has conditional heteroskedasticity
- Is ε correlated to X1, X2? This is the question
ARCH Model (Steps)
- Regress ε² (Var) on a1 *ε² (t-1)
- Test for a1 = 0 / a1 ≠ 0
- H0 is good. H1 means it has Cond. Hetero
- If it has CH, use Generalized Least Squares
Regression using > 1 Time Series (concept)
Y (Time Series #1) = f (X = other Time Series #2)
Many Time Series: when can I use?
- If BOTH ARE cov stationary, or
- If BOTH are NOT, but ARE cointegrated (share a common trend)
Big Data Learning Types
- Supervised: Labeled Data
- Unsupervised: Data is NOT labeled
Big Data Variables (Types)
- Feature (Input)
- Target (Output)
Big Data Problem Categories
- Regress (Continuous Target)
- Classification (Categorical / Order)
Overfit Problem (Definition)
- Treat noise as parameter
Samples Used to Test Model (Types)
- Training Sample
- Validation Sample
- Test Sample
Big Data Error (Types)
Bias Error: Underfitting (acertar pouco in-sample)
Variance Error: Overfit (acerta d+ in-sample, ruim na hora de generalizar)
Base Error: Noise
Complexity Problem Solving
- Reduce Complexity
- Cross Validation (invert training and validation samples)
- K-Fold Cross Validation: let (n-1) and test in the last one to avoid sample error
Supervised Learning Methods
- CART
- K-nearest neighbors
- LASSO (elimina IVs + hyperparameter)
- Penalized Regression (hyperparemeter)
- SVM
- Calvin Klein Luan Panisson
Unsupervised Learning Methods
- PCA (reduce dimensionality)
- Clustering
- Neural Networks (desdobra em Deep Learning Nets e Reinforcement Learning)
Regress & Classification Methods (which work for both)
- Neural Networks
- Deep Learning Nets
- Reinforced Learning
Regression Methods
Not Linear:
- CART
- Random Forest
- Neural Nets
Linear: Regression
Classification Methods
Labeled:
- Complex: CART, Random Forest
- Normal: KNN, SVM
Unlabeled:
- Complex: Neural Nets
- Normal: K-means (# categories known) or Hierarchical Clustering
Structured Data Cleasing (Processes)
- Incomplete
- Inconsistent
- Inaccurate
- Invalid
- Non-Uniform
- Duplicate
Unstructured Data Cleasing (Processes)
- Remove HTML tags
- Lowercase
- Remove stop words
- STEM
- Lemmarize
Big Data Projects Steps
- Conceptualize
- Data Collection
- Data Preparation (Clean, Wrangle)
- Data Exploration
- Model Training
Stem (Definition)
Data Cleansing:
- From all derived to root word
- Connection / Connecting -> Connect (root)
Lemmatize (Definition)
Data Cleansing:
- Remove endings if the base is in a dictionary
- More costly and advanced
- Takes context / speech to change data
Data Processing / Wrangle (Types)
- Structured: Extract, Filter, Aggregate, Convert (Trim, Scale, Normalize)
- Unstructured: Tokenize, Bag of Words
Tokenization (Definition)
Data Preprocessing:
- Text -> Key words
Document Term Matrix (Type)
Rows: Text Words
Columns: Words to be Analyzed (token defined previously)
Bag of Words (Definition)
- Created after data is cleansed and structured
- Pack of words
Data Exploration (Types)
- Structured:
- Data Visualize
- Feature Selection
- Engineering OHE (convert classification into dummy)
- Unstructured:
- Feature Selection: word counts, frequency, cloud
- Engineering: number length, N-gram (multi-word pattern), name entity recognition, part of speech
Big Data Properties
- Variety (↑): Níveis de estrutura
- Velocity (↑): Latência
- Volume (↑): Terabytes
- Veracity (↓): fake news
Order of Model Training Table
P (Vertical): 1 / 0
A (Horizontal): 1/0
Lembrar: Paulo Amora
H0 = class = 0 Ha = class ≠ 0 (ou seja, 1)
Precision (Formula)
Precision (→): TP / (TP + FP)
Error #1 is bad
Recall / Sensitivity (Formula)
Recall (↓): TP / (TP+FN)
Error #2 is bad
HIV = Não rejeitar H0, H0 false
Accuracy (Formula)
Accuracy = (TP + TN) / (TP + TN + FP + FN)
All True / All Possibilities
Receiver Operating Characteristic (ROC)
Chart about all POSITIVES (+)
X-axis: FPR = FP / (FP + TN exact opposite)
Y-axis: TPR = TP / (TP + FN exact opposite)
Highest area under chart = better