Quant Flashcards

Question 1

Q

5 Assumptions to use a multiple regression model

Answer

A

1) Linearity
2) Homoskedasticity
3) Independence of Errors
4) Normality
5) Independence of Independent Variables

Question 2

Q

Linearity Assumption

Answer

A

The relationship between the independent variable(s) and dependent variable needs to be linear

Question 3

Q

Homoskedasticity Assumption

Answer

A

the variance of the regression residuals should be the same for all observations

Question 4

Q

Independence of Errors Assumption

Answer

A

The observations are independent of one another and uncorrelated

Question 5

Q

Normality Assumption

Answer

A

The regression residuals are normally distributed

Question 6

Q

Independence of Independent Variables Assumption

Answer

A

Independent variables are not random and they are not correlated

Question 7

Q

Adjusted R-Squared

Answer

A

Adjusted version of R-squared that increases when new variables introduced into the model help improve its accuracy

Question 8

Q

AIC v. BIC

Answer

A

AIC is for prediction
BIC is for goodness of fit
Lower values are better for both

Question 9

Q

F Statistic

Answer

A

[(SSE of unrestricted - SSE of restricted)/q] / (SSE of restricted)(n-k-1)

SSE is mean squared

Question 10

Q

T Stat when only given coefficient and standard error, and what is null hypothesis

Answer

A

coefficient/error, null hypothesis is coefficient does not differ significantly from 0

Question 11

Q

Breusch Pagan Test (BP)
- What does it test for
- What is the formula

Answer

A

1) Conditional Heteroskedasticity - variance in residuals differs across observations

2) n*R-Squared

Question 12

Q

2 Types of Heteroskedasticity

Answer

A

1) Conditional - error variance is correlated with independent variables (much bigger problem) - high probability of Type 1 errors

2) Unconditional - less problematic, no correlations

Question 13

Q

Durbin-Watson Test (DW)

Answer

A

A test for first-order serial correlation in time series model

Question 14

Q

Breusch-Godfrey Test (BG)

Answer

A

A test to used to determine autocorrelation up to a predesignated order of the lagged residuals in a time series model

Question 15

Q

Multicollinearity

Answer

A

When two or more independent variables are correlated to each other

Question 16

Q

Test for multicollinearity

Answer

A

Variance inflation factor (VIF)

1 / (1-R-Squared)

Any value over 5 warrants investigation
Any value over 10 means multicollinearity is likely

Question 17

Q

Two types of observations that may influence regression results

Answer

A

1) High Leverage Point
2) Outlier

Question 18

Q

Difference between high leverage point and outlier

Answer

A

High leverage point is when x value is extreme and outlier is when the y value is extreme, however a point can be both high leverage and an outlier

Question 19

Q

How to calculate if a point is high leverage

Answer

A

Leverage

If leverage exceeds 3*(k+1)/n

k - independent variables
n - observations

Question 20

Q

When looking at regression, determine if independent variable is significantly different from 0

Answer

A

If T stat > p value, it is significantly different from 0

T stat if not given is coefficient / standard error

Question 21

Q

Method to identify if method is an outlier and what is the formula

Answer

A

Studentized deleted residuals

t(I) = residual with the ith term deleted (e(I)) / standard deviation of all residuals (s(e)) == this equals standard error

if greater than 3 or greater than the critical t stat with n-k-2 degrees of freedom, observation is an outlier

Question 22

Q

When is an observation considered influential

Answer

A

If its exclusion from the sample causes substantial changes in the regression function

Question 23

Q

Cook’s D

Answer

A

Metric for identifying influential observations

Question 24

Q

Interpreting Cook’s D

Answer

A

If value is greater than 0.5, possibly influential

If value is greater than 1, likely influential

If value greater than SqRt(k/n), likely influential

Question 25

Q

Dummy Variable

Answer

A

Independent variable that takes on a value of either 0 or 1

also called indicator variable

Question 26

Q

Types of dummy Variables

Answer

A

1) Intercept Dummy
2) Slope Dummy
3) Interaction Term

Question 27

Q

Go from log odds to probability

Answer

A

1) Raise it to power of e, this is odds
2) Take odds/(1+odds), this is probability

Question 28

Q

Likelihood Ratio (LR) Test

Answer

A

A method to assess the fit of logistic regression models that is based on the log-likelihood metric that describes the model’s fit to the data

LR = -2 * (Log-likelihood of restricted model - log-likelihood of unrestricted model)

Question 29

Q

Calculate Standard Error of autocorrelations in time series

Answer

A

1 / sqrt(T), where T is number of observations, uniform for every observation

Question 30

Q

Covariance Stationary

Answer

A

A key assumption to make a valid statistic inference in time series models

1) Expected value must be constant and finite in all periods

2) Variance must be constant and finite in all periods

3) Covariance must be constant and finite in all periods

Question 31

Q

Autocorrelation

Answer

A

Correlations of a time series with its own past values

Question 32

Q

Mean reverting level of a time series

Answer

A

b(0) / (1-b(1))

Question 33

Q

Root Mean Squared Error (RMSE)

Answer

A

The square root of the average squared forecast error, used to compare the out-of-forecast performance of forecasting models

Smallest RMSE is most accurate

Question 34

Q

How to handle simple random walk without drift

Answer

A

First difference the time series because it makes it covariance stationary

Question 35

Q

Expected Value of simple random walk without drift

Question 36

Q

How to test for unit root

Answer

A

Dickey-Fuller Test

The null hypothesis is that a unit root is present, so rejected the null is to say the time series is covariance stationary

Question 37

Q

Unit Root

Answer

A

A time series that is not covariance stationary has a unit root and is therefore a random walk

When the absolute value of the lag coefficient (b1) is 1 or greater than 1, unit root is present

Question 38

Q

Co-integration

Answer

A

If we are mapping two series and both have a unit root, they are co-integrated, meaning they move together, and a relationship can be established between the two

Question 39

Q

Mean Reverting Level

Answer

A

b(0) / (1-b(1)), where b0 and b1 are the coefficients in the model you’re referencing

Question 40

Q

How to interpret Durbin Watson

Answer

A

A value of 2 means there is no serial autocorrelation
2-4 is negative correlation
0-2 is positive correlation

1.5-2.5 is safe zone where you can use the results

Question 41

Q

When can you not use the Durbin Watson Test in a time series

Answer

A

When one of the independent models you are using is a lagged dependent variable

Question 42

Q

RMSE Calculation

Answer

A

1) Take difference between mean and forecasts
2) Square the differences
3) Sum the squares
4) Divide by the number of observations to get the mean
5) Take square root of the mean

The lower the RMSE the more accurate the model

Question 43

Q

How to tell if model is covariance stationary based off regression results

Answer

A

coefficient/standard error for each b term (or respective t stat) and compare to critical t stat

if not greater, not significantly different from 0 and therefore not covariance stationary, and also has a unit root

Question 44

Q

Null hypothesis in Dickey Fuller Test

Answer

A

Null is there is unit root, so if T stat below critical value, there is unit root

Question 45

Q

In AR1 Model, how do you know if there is a unit root (random walk)

Answer

A

If B0 is 0 and B1 is 1

Question 46

Q

A bag of words

Answer

A

Representation of text that describes the occurrence of words within a document

Question 47

Q

Winsorization

Answer

A

The process of replacing extreme values and outliers with the maximum and minimum points

Question 48

Q

Recall

Answer

A

TP/TP+FN -> uses first column only

Question 49

Q

Precision

Answer

A

TP/TP + FP -> Uses first row only

Question 50

Q

When would CART and random forests be used

Answer

A

classification of labeled data and regression

not used for unlabeled data

Question 51

Q

Low bias error but high variance are indicative of what

Answer

A

Overfitting

Question 52

Q

Tokenization

Answer

A

Splitting a given word into text or characters

Question 53

Q

Which supervised learning technique requires no hyperparameter

Question 54

Q

Hyperparameter in LASSO

Question 55

Q

Hyperparameter in KNN

Question 56

Q

K means clustering

Answer

A

Unsupervised technique where partitions observation into a fixed number, k, of non-overlapping clusters. Each cluster is characterized by its center (centroid) and each observation is assigned to the cluster with the centroid it matches closest with

Question 57

Q

What does the r stand for in DW equation 2(1-r)

Answer

A

The sample correlation between the regression residuals

Question 58

Q

What types of variables are logistic regression most suited for

Answer

A

discrete variables, where traditional regression is suited for continuous variables

Question 59

Q

Target vs. Features

Answer

A

In supervised learning, target is the y (dependent variable) and features are the x (independent variable)

Question 60

Q

Complexity

Answer

A

The number of features in a model

Question 61

Q

Bias Error

Answer

A

The degree to which a model fits the data

Question 62

Q

Base Error

Answer

A

Due to randomness in the data

Question 63

Q

Variance Error

Answer

A

How much the model changes to new observations

Question 64

Q

Learning Curve

Answer

A

Curve that plots the accuracy rate

Answer 61

A

Adds a penalty to the objective function for observations that are misclassified in a SVM model

Answer 62

A

A supervised learning technique that classifies a new observation by finding similarities between this observation and the existing data

Answer 63

A

a supervised learning technique that can be used to predict either a categorical or target variable, typically used on binary classification or regression

Answer 64

A

a regularization technique used in CART models to reduce the dimensions of the model

Answer 65

A

Combining the predictions from a collection of models

Answer 66

A

bootstrap aggregating
the original training data is used to generate new training data

Answer 67

A

A collection of a large number of decision trees via bagging

Answer 68

A

Harmonic mean of recall and precision

(2PR) / (P+R)

Answer 69

A

a unsupervised technique to reduce dimensions

Answer 70

A

a variable that combines two or more variables that are statistically strongly related to each other

Answer 71

A

in the context of PCA, a vector that defines new mutually uncorrelated composite variables that are linear combinations of the original features

Answer 72

A

A measure that gives the proportion of total variance in the initial dataset that is explained by each eigenvector

Answer 73

A

a Plot that shows the proportion of total variance in the data explained by each principal component

Answer 74

A

Iterative procedure used to build a hierarchy of clusters

Answer 75

A

a bottom-up hierarchical clustering method that begins with each observation being treated as its own cluster

Answer 76

A

A top-down hierarchical clustering method that starts with all observations belong to a single large cluster

Answer 77

A

a type of tree diagram used for visualizing a hierarchical cluster analysis

Answer 78

A

A functional part of a neural network’s node that multiplies each input value received by a weight and sums the weighted values to form the total net input, which is then passed to the activation function

Answer 79

A

A functional part of a neural network’s node that transforms the total net input received into the final output of the node

Answer 80

A

The process of adjusting weights in a neural network, to reduce total error of the network, by moving backward through the network’s layers

Answer 81

A

a Parameter that affects the magnitude of adjustments in the weights in a neural network

Answer 82

A

The process of adjusting weights in a neural network, to reduce total error of the network, by moving forward through the network’s layers

Answer 83

A

Neural networks with many hidden layers, at least 2, but often more than 20

Answer 84

A

Machine learning in which a computer learns from interacting with itself or data generated by the same algorithm

Answer 85

A

1) Volume
2) Variety
3) Velocity

Answer 86

A

Process of converting inflected forms of a word into its base word (analyzing -> analyz)

Answer 87

A

Process of converting inflected forms of a word into its morphological root (analyzing -> analyze)

Answer 88

A

A collection of distinct set of tokens from all the texts in a sample dataset, but does not capture the position or sequence of those words

the next step after cleansing data

Answer 89

A

last step of text processing

uses the BOW

Matrix where each row belongs to a document and each column represents a token

Answer 90

A

a representation of word sequences, unigram, bigram,trigram etc.

Answer 91

A

FP / (TN+FP)

Answer 92

A

TP / (TP + FN)

Answer 93

A

Where the cost of FP/Type 1 Error is high

Answer 94

A

When cost of FN/Type 2 error is high

Answer 95

A

linear data

Answer 96

A

The accuracy of data

Answer 97

A

The data conflicts with what it should be (male in name column), “it doesn’t make sense” data point

Answer 98

A

Data not presented in same format

Answer 99

A

New Variable is created using existing data

Answer 100

A

Feature selection minimizes overfitting and feature engineering minimizes undercutting

Answer 101

A

(value - min) / (max - min)

Answer 102

A

0%, this is unsupervised data set

Answer 103

A

When the result is outside the meaningful range

Answer 104

A

if the relationship between the dependent and independent variables is strong, the SEE will be low

Sq (MSE)
MSE = SSE / n-k-1

Answer 105

A

t = (r * sq(n-2)) / (sq(1-r^2))

Answer 106

A

SSE / n-k-1

Answer 107

A

n - k - 1

Answer 108

A

MSR / MSE

MSR formula = RSS / k
MSE = SSE / n-k-1

Answer 109

A

at least one of the coefficient is significantly different than 0, which is good for explanatory reasons

Answer 110

A

Type 1 errors

Answer 111

A

type 2 errors

Answer 112

A

1) Regression
2) Classification

Answer 113

A

If the target variable is continuous (supervised learning)

Answer 114

A

If the target variable is categorical or ordinal, such as company rating
(Supervised learning)

Answer 115

A

1) Dimension reduction
2) Clustering

Answer 116

A

supervised learning

Answer 117

A

EITHER continuous or categorical

Answer 118

A

unsupervised / clustering / bottom up

Answer 119

A

unsupervised / provides insight into the volatility contained in a data set

Answer 120

A

supervised

Answer 121

A

supervised / regression

Answer 122

A

technique for mitigating excess reduction of the training set size by reshuffling the training set

Answer 123

A

1) CART provides visual
2) CART does not require initial hyper parameters set
3) CART does not require to specify a similarity measure

Answer 124

A

when prediction error on test data is minimzed

Answer 125

A

underfitting

Answer 126

A

bias error

Answer 127

A

underfitting

Answer 128

A

indeterminable

Answer 129

A

first-differenced regression

Answer 130

A

serial correlation

Answer 131

A

when x1 is greater than 1

Answer 132

A

A word that is so common in a text that it carries no meaning

Answer 133

A

lowercasing, removing stop words, stemming and lemmatization

Answer 134

A

data sparseness and low frequency tokens