3. Quantitative Methods Flashcards

Question 1

Q

Sum of Squared Errors (Definition)

Answer

A

Difference between Yi and Ŷ (observation and estimate)

Question 2

Q

Sum of Squared Regression (Definition)

Answer

A

Difference between Ŷ and Mean of Y (regression and best descriptive estimator)

Question 3

Q

SSR (#Degrees of Freedom)

Answer

A

k (# parameters of X estimated in the regression)

Question 4

Q

SSE (#Degrees of Freedom)

Answer

A

n-k-1 (N - estimators of X - intercept)

Question 5

Q

SST (#Degrees of Freedom)

Question 6

Q

Mean Squared of Regression (Formula)

Answer

A

MSR = SSR/k

Question 7

Q

Mean Squared of Error (Formula)

Answer

A

MSE = SSE/(n-k-1)

Question 8

Q

Squared Error of Estimate (SEE Formula)

Answer

A

SSE = √MSE

The lower, the more accurate the model is

Question 9

Q

F Test

Answer

A

F = MSR / MSE (testar a diferença entre a regressão em comparação com o erro)

DF @ K numerator (horizontal)
DF @ N-K-1 denominator (vertical)

Question 10

Q

Regression Assumptions

Answer

A

Linearity
Homoscedasticity (var ε same across observations). Muita ou pouca VOL.
Pairs X and Y are independent (if not, there is serial correlation)
a. Residuals are independently distributed
b. Residuals’ distribution is Normal

Question 11

Q

B0 (Intercept Test)

Answer

A

T-Test = (B1 est - B1 hipótese) / SB1

One-tail or Two-tails @ df = (n-k-1), as I am using error as a denominator

Sb1 = SEE / Sum of Sqaures of (Obs X - Mean X)

Question 12

Q

Dummy Variable

Answer

A

Y = b0 + b1*Dummy

Dummy = 0 or 1

If Dummy = 0, then Y = b0 = mean

If Dummy = 1, then Y = b0 + b1

Question 13

Q

Confidence Interval (Formula)

Answer

A

Interval = Ŷ ± T-Critical * Sf

Ŷ = Calculate using regression
Sf = Std Error of Forecast =
Sf = SEE² * [1 + 1/n * [(X-Mean)²/(n-1*Sx²)]

Question 14

Q

R² (Formula)

Answer

A

R² = SSR/SST = Measure of Fit

Question 15

Q

Regression Types

Answer

A

Log-Lin = lnY = b0+b1X1
Log-Log = lnY = b0 + b1*(lnX1)
Lin-Log = Y = b0 + b1*(lnX1)

Question 16

Q

Multiple Regression Assumptions

Answer

A

X e Y are linear
IVs (X) are not random
E (ε | X1, X2, Xk) = 0
E (ε²) = Variância e é igual para todas as observações
E (erro1, erro2) = 0, erro não é correlacionado
Erro é distr. ~N

Question 17

Q

F-statistic for Multiple (Hypothesis)

Answer

A

H0: B1 = B2 = B3 = 0
H1: At least one ≠ 0

One-Tailed Test @
DF Numerator = K = Horizontal
DF Denominator = (N-K-1) = Vertical

Question 18

Q

R² Adjusted (Formula)

Answer

A

Adj. R² = 1 - [(n-1)/(n-k-1)] * [1-R²]

Question 19

Q

Multicolinearity (Definition)

Answer

A

B1 e B2 t-tests are not relevant, but F-test is

Reason: Two IVs are highly correlated
Detection: ↑ R² and ↑ F-test, but ↓ B0
Correction: Omit one variable
Consequence: ↑ SE = ↓ F test

Question 20

Q

Heteroskedasticity (Definition)

Answer

A

Var of ε changes across observations

Unconditional: Var (ε) NOT correlated w/ IVs
Conditional: Var (ε) IS correlated w/ IVs

Correction:

Robust Std Errors
Generalized Least Squares

Question 21

Q

Heteroskedasticity (Test)

Answer

A

Breusch Pagan Test (OH NO)

H0: NO conditional
H1: Conditional

Test = n * R²*ε @ Chi Squared Table

Regress the error on the IVs

Question 22

Q

Hansen Method (Definition)

Answer

A

Preferred if (i) SC or (ii) SC + Heteroskedasticity

Question 23

Q

Serial Correlation (Definition)

Answer

A

Errors are explained by similar reasons
↓ SEE = ↑ F-test
Violates Independence of Pairs (X and Y)
Se o erro anterior é positivo, chance do erro seguinte ser positivo é de fato mais alta
If IV = Y lagged, then B0 will not be valid

Question 24

Q

Test for Serial Correlation

Answer

A

Durbin Watson (Deutsche Welle)

H0: DW = 2 (No Correl)
H1: DW ≠ 2 (Correl)

Test = 2*(1-r)
DF = K and N items

Correction: (i) Modified SEs,

(ii) Modify Regression Equation
(iii) Include seasonal term

Question 25

Q

Hansen or White Method (Criteria)

Answer

A

If only Hetero: White SEs
If only SC: Hansen
If both: Hansen is preferred

Question 26

Q

Standard Error of Residuals (Formula)

Answer

A

SE Residuals = 1 /√T, where T = # observations

Question 27

Q

Misspecifications of Model (List)

Answer

A

Data Mining
Functional Form (Linear, Log, Diff Samples)
Parsimonious IVs
Examine violations before accepting
Tested out of sample

Question 28

Q

Logit Regressions

Answer

A

Ln (Odds) = B0 + B1X1 + BnXn + ε

Estima a máxima chance de o sample ter acontecido

Slope = Chg Log Odds of event happening

Question 29

Q

Odds (Formula)

Answer

A

Odds = p % / (1 - p)

ln (p/(1-p)) = b0+b1X1 + ε

p = (e^A) / (1 + e^A), onde A = equação

Question 30

Q

Time Series Analysis

Answer

A

Yt = B0 + B1t + εt, where IV is Time

Often have Serial Correlation, so test for Durbin Watson

Question 31

Q

AR Model (Concept)

Answer

A

Xt = B0 + B1Xt-1 + ε (AR1)

Regress X in its past values
Se B1 < 1, modelo é mean reverting
Não pode usar DW
Tem que usar t = Autocorrel / (1 /√T)

Question 32

Q

AR (2) Model Structure

Answer

A

AR(2) Model Y(t) = b0 + b1* y(t-1) + b2*y(t-2)

Question 33

Q

Covariance Stationary Model

Answer

A

(a) Mean = fixa
(b) Variance = constante
(c) Cov(Yt,Yt-s)

Todos constantes e finitos

Question 34

Q

Mean Reverting Level (Formula)

Answer

A

Xt = b0 / (1 - b1)

Question 35

Q

Compare 2 models (in terms of forecasting power)

Answer

A

In-Sample forecasts: Predicted v. Observed
Out of Sample:
Use RMSE = √(Soma Act - Forecast)²/ n
RMSE = menor melhor

Question 36

Q

Random Walk (Definition)

Answer

A

X is explained by a sum of errors
AR model where B0 = 0 and B1 = 1, then not mean reverting level
No Mean Reverting
No Cte Variance

Question 37

Q

Random Walk w/ Drift (Definition)

Answer

A

Intercept is not ZERO (B0 ≠ 0)

- B1 is still 1

Question 38

Q

Seasonality (Definition)

Answer

A

One of autocorrelation tests for IVs in AR model will be very significative
Correction: include a seasonal lag

Question 39

Q

ARCH Model (Definition)

Answer

A

It means testing to check if an AR model has conditional heteroskedasticity
Is ε correlated to X1, X2? This is the question

Question 40

Q

ARCH Model (Steps)

Answer

A

Regress ε² (Var) on a1 *ε² (t-1)
Test for a1 = 0 / a1 ≠ 0
H0 is good. H1 means it has Cond. Hetero
If it has CH, use Generalized Least Squares

Question 41

Q

Regression using > 1 Time Series (concept)

Answer

A

Y (Time Series #1) = f (X = other Time Series #2)

Question 42

Q

Many Time Series: when can I use?

Answer

A

If BOTH ARE cov stationary, or

- If BOTH are NOT, but ARE cointegrated (share a common trend)

Question 43

Q

Big Data Learning Types

Answer

A

Supervised: Labeled Data

- Unsupervised: Data is NOT labeled

Question 44

Q

Big Data Variables (Types)

Answer

A

Feature (Input)

- Target (Output)

Question 45

Q

Big Data Problem Categories

Answer

A

Regress (Continuous Target)

- Classification (Categorical / Order)

Question 46

Q

Overfit Problem (Definition)

Answer

A

Treat noise as parameter

Question 47

Q

Samples Used to Test Model (Types)

Answer

A

Training Sample
Validation Sample
Test Sample

Question 48

Q

Big Data Error (Types)

Answer

A

Bias Error: Underfitting (acertar pouco in-sample)
Variance Error: Overfit (acerta d+ in-sample, ruim na hora de generalizar)
Base Error: Noise

Question 49

Q

Complexity Problem Solving

Answer

A

Reduce Complexity
Cross Validation (invert training and validation samples)
K-Fold Cross Validation: let (n-1) and test in the last one to avoid sample error

Question 50

Q

Supervised Learning Methods

Answer

A

CART
K-nearest neighbors
LASSO (elimina IVs + hyperparameter)
Penalized Regression (hyperparemeter)
SVM
Calvin Klein Luan Panisson

Question 51

Q

Unsupervised Learning Methods

Answer

A

PCA (reduce dimensionality)
Clustering
Neural Networks (desdobra em Deep Learning Nets e Reinforcement Learning)

Question 52

Q

Regress & Classification Methods (which work for both)

Answer

A

Neural Networks
Deep Learning Nets
Reinforced Learning

Question 53

Q

Regression Methods

Answer

A

Not Linear:

CART
Random Forest
Neural Nets

Linear: Regression

Question 54

Q

Classification Methods

Answer

A

Labeled:

Complex: CART, Random Forest
Normal: KNN, SVM

Unlabeled:

Complex: Neural Nets
Normal: K-means (# categories known) or Hierarchical Clustering

Question 55

Q

Structured Data Cleasing (Processes)

Answer

A

Incomplete
Inconsistent
Inaccurate
Invalid
Non-Uniform
Duplicate

Question 56

Q

Unstructured Data Cleasing (Processes)

Answer

A

Remove HTML tags
Lowercase
Remove stop words
STEM
Lemmarize

Question 57

Q

Big Data Projects Steps

Answer

A

Conceptualize
Data Collection
Data Preparation (Clean, Wrangle)
Data Exploration
Model Training

Question 58

Q

Stem (Definition)

Answer

A

Data Cleansing:

From all derived to root word
Connection / Connecting -> Connect (root)

Question 59

Q

Lemmatize (Definition)

Answer

A

Data Cleansing:

Remove endings if the base is in a dictionary
More costly and advanced
Takes context / speech to change data

Question 60

Q

Data Processing / Wrangle (Types)

Answer

A

Structured: Extract, Filter, Aggregate, Convert (Trim, Scale, Normalize)
Unstructured: Tokenize, Bag of Words

Question 61

Q

Tokenization (Definition)

Answer

A

Data Preprocessing:

- Text -> Key words

Question 62

Q

Document Term Matrix (Type)

Answer

A

Rows: Text Words
Columns: Words to be Analyzed (token defined previously)

Question 63

Q

Bag of Words (Definition)

Answer

A

Created after data is cleansed and structured

- Pack of words

Question 64

Q

Data Exploration (Types)

Answer

A

Structured:
Data Visualize
Feature Selection
Engineering OHE (convert classification into dummy)
Unstructured:
Feature Selection: word counts, frequency, cloud
Engineering: number length, N-gram (multi-word pattern), name entity recognition, part of speech

Answer 64

A

Variety (↑): Níveis de estrutura
Velocity (↑): Latência
Volume (↑): Terabytes
Veracity (↓): fake news

Answer 65

A

P (Vertical): 1 / 0
A (Horizontal): 1/0

Lembrar: Paulo Amora

H0 = class = 0
Ha = class ≠ 0 (ou seja, 1)

Answer 66

A

Precision (→): TP / (TP + FP)

Error #1 is bad

Answer 67

A

Recall (↓): TP / (TP+FN)
Error #2 is bad
HIV = Não rejeitar H0, H0 false

Answer 68

A

Accuracy = (TP + TN) / (TP + TN + FP + FN)

All True / All Possibilities

Answer 69

A

Chart about all POSITIVES (+)

X-axis: FPR = FP / (FP + TN exact opposite)
Y-axis: TPR = TP / (TP + FN exact opposite)

Highest area under chart = better