Quant Flashcards

1
Q

5 Assumptions to use a multiple regression model

A

1) Linearity
2) Homoskedasticity
3) Independence of Errors
4) Normality
5) Independence of Independent Variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linearity Assumption

A

The relationship between the independent variable(s) and dependent variable needs to be linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Homoskedasticity Assumption

A

the variance of the regression residuals should be the same for all observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Independence of Errors Assumption

A

The observations are independent of one another and uncorrelated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Normality Assumption

A

The regression residuals are normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Independence of Independent Variables Assumption

A

Independent variables are not random and they are not correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Adjusted R-Squared

A

Adjusted version of R-squared that increases when new variables introduced into the model help improve its accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

AIC v. BIC

A

AIC is for prediction
BIC is for goodness of fit
Lower values are better for both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

F Statistic

A

[(SSE of unrestricted - SSE of restricted)/q] / (SSE of restricted)(n-k-1)

SSE is mean squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

T Stat when only given coefficient and standard error, and what is null hypothesis

A

coefficient/error, null hypothesis is coefficient does not differ significantly from 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Breusch Pagan Test (BP)
- What does it test for
- What is the formula

A

1) Conditional Heteroskedasticity - variance in residuals differs across observations

2) n*R-Squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

2 Types of Heteroskedasticity

A

1) Conditional - error variance is correlated with independent variables (much bigger problem) - high probability of Type 1 errors

2) Unconditional - less problematic, no correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Durbin-Watson Test (DW)

A

A test for first-order serial correlation in time series model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Breusch-Godfrey Test (BG)

A

A test to used to determine autocorrelation up to a predesignated order of the lagged residuals in a time series model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Multicollinearity

A

When two or more independent variables are correlated to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Test for multicollinearity

A

Variance inflation factor (VIF)

1 / (1-R-Squared)

Any value over 5 warrants investigation
Any value over 10 means multicollinearity is likely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Two types of observations that may influence regression results

A

1) High Leverage Point
2) Outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Difference between high leverage point and outlier

A

High leverage point is when x value is extreme and outlier is when the y value is extreme, however a point can be both high leverage and an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to calculate if a point is high leverage

A

Leverage

If leverage exceeds 3*(k+1)/n

k - independent variables
n - observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When looking at regression, determine if independent variable is significantly different from 0

A

If T stat > p value, it is significantly different from 0

T stat if not given is coefficient / standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Method to identify if method is an outlier and what is the formula

A

Studentized deleted residuals

t(I) = residual with the ith term deleted (e(I)) / standard deviation of all residuals (s(e)) == this equals standard error

if greater than 3 or greater than the critical t stat with n-k-2 degrees of freedom, observation is an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When is an observation considered influential

A

If its exclusion from the sample causes substantial changes in the regression function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Cook’s D

A

Metric for identifying influential observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Interpreting Cook’s D

A

If value is greater than 0.5, possibly influential

If value is greater than 1, likely influential

If value greater than SqRt(k/n), likely influential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Dummy Variable

A

Independent variable that takes on a value of either 0 or 1

also called indicator variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Types of dummy Variables

A

1) Intercept Dummy
2) Slope Dummy
3) Interaction Term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Go from log odds to probability

A

1) Raise it to power of e, this is odds
2) Take odds/(1+odds), this is probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Likelihood Ratio (LR) Test

A

A method to assess the fit of logistic regression models that is based on the log-likelihood metric that describes the model’s fit to the data

LR = -2 * (Log-likelihood of restricted model - log-likelihood of unrestricted model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Calculate Standard Error of autocorrelations in time series

A

1 / sqrt(T), where T is number of observations, uniform for every observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Covariance Stationary

A

A key assumption to make a valid statistic inference in time series models

1) Expected value must be constant and finite in all periods

2) Variance must be constant and finite in all periods

3) Covariance must be constant and finite in all periods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Autocorrelation

A

Correlations of a time series with its own past values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Mean reverting level of a time series

A

b(0) / (1-b(1))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Root Mean Squared Error (RMSE)

A

The square root of the average squared forecast error, used to compare the out-of-forecast performance of forecasting models

Smallest RMSE is most accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How to handle simple random walk without drift

A

First difference the time series because it makes it covariance stationary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Expected Value of simple random walk without drift

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How to test for unit root

A

Dickey-Fuller Test

The null hypothesis is that a unit root is present, so rejected the null is to say the time series is covariance stationary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Unit Root

A

A time series that is not covariance stationary has a unit root and is therefore a random walk

When the absolute value of the lag coefficient (b1) is 1 or greater than 1, unit root is present

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Co-integration

A

If we are mapping two series and both have a unit root, they are co-integrated, meaning they move together, and a relationship can be established between the two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Mean Reverting Level

A

b(0) / (1-b(1)), where b0 and b1 are the coefficients in the model you’re referencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How to interpret Durbin Watson

A

A value of 2 means there is no serial autocorrelation
2-4 is negative correlation
0-2 is positive correlation

1.5-2.5 is safe zone where you can use the results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

When can you not use the Durbin Watson Test in a time series

A

When one of the independent models you are using is a lagged dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

RMSE Calculation

A

1) Take difference between mean and forecasts
2) Square the differences
3) Sum the squares
4) Divide by the number of observations to get the mean
5) Take square root of the mean

The lower the RMSE the more accurate the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How to tell if model is covariance stationary based off regression results

A

coefficient/standard error for each b term (or respective t stat) and compare to critical t stat

if not greater, not significantly different from 0 and therefore not covariance stationary, and also has a unit root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Null hypothesis in Dickey Fuller Test

A

Null is there is unit root, so if T stat below critical value, there is unit root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

In AR1 Model, how do you know if there is a unit root (random walk)

A

If B0 is 0 and B1 is 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

A bag of words

A

Representation of text that describes the occurrence of words within a document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Winsorization

A

The process of replacing extreme values and outliers with the maximum and minimum points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Recall

A

TP/TP+FN -> uses first column only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Precision

A

TP/TP + FP -> Uses first row only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

When would CART and random forests be used

A

classification of labeled data and regression

not used for unlabeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Low bias error but high variance are indicative of what

A

Overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Tokenization

A

Splitting a given word into text or characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Which supervised learning technique requires no hyperparameter

A

SVM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Hyperparameter in LASSO

A

lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Hyperparameter in KNN

A

k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

K means clustering

A

Unsupervised technique where partitions observation into a fixed number, k, of non-overlapping clusters. Each cluster is characterized by its center (centroid) and each observation is assigned to the cluster with the centroid it matches closest with

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What does the r stand for in DW equation 2(1-r)

A

The sample correlation between the regression residuals

55
Q

What types of variables are logistic regression most suited for

A

discrete variables, where traditional regression is suited for continuous variables

56
Q

Target vs. Features

A

In supervised learning, target is the y (dependent variable) and features are the x (independent variable)

57
Q

Complexity

A

The number of features in a model

58
Q

Bias Error

A

The degree to which a model fits the data

59
Q

Base Error

A

Due to randomness in the data

59
Q

Variance Error

A

How much the model changes to new observations

60
Q

Learning Curve

A

Curve that plots the accuracy rate

61
Q

Soft Margin Misclassification

A

Adds a penalty to the objective function for observations that are misclassified in a SVM model

62
Q

K Nearest Neighbor

A

A supervised learning technique that classifies a new observation by finding similarities between this observation and the existing data

63
Q

Classification and Regression Tree (CART)

A

a supervised learning technique that can be used to predict either a categorical or target variable, typically used on binary classification or regression

64
Q

Pruning

A

a regularization technique used in CART models to reduce the dimensions of the model

65
Q

Ensemble Learning

A

Combining the predictions from a collection of models

66
Q

Bagging

A
  • bootstrap aggregating
  • the original training data is used to generate new training data
67
Q

Random forest classifier

A

A collection of a large number of decision trees via bagging

68
Q

F1 Score

A

Harmonic mean of recall and precision

(2PR) / (P+R)

69
Q

Principal Components Reduction (PCA)

A

a unsupervised technique to reduce dimensions

70
Q

Composite variable

A

a variable that combines two or more variables that are statistically strongly related to each other

71
Q

Eigenvector

A

in the context of PCA, a vector that defines new mutually uncorrelated composite variables that are linear combinations of the original features

72
Q

Eigenvalue

A

A measure that gives the proportion of total variance in the initial dataset that is explained by each eigenvector

73
Q

Scree plot

A

a Plot that shows the proportion of total variance in the data explained by each principal component

74
Q

Hierarchical Clustering

A

Iterative procedure used to build a hierarchy of clusters

75
Q

Agglomerative clustering

A

a bottom-up hierarchical clustering method that begins with each observation being treated as its own cluster

76
Q

Divise clustering

A

A top-down hierarchical clustering method that starts with all observations belong to a single large cluster

77
Q

Dendrogram

A

a type of tree diagram used for visualizing a hierarchical cluster analysis

78
Q

Summation operator

A

A functional part of a neural network’s node that multiplies each input value received by a weight and sums the weighted values to form the total net input, which is then passed to the activation function

79
Q

Activation Function

A

A functional part of a neural network’s node that transforms the total net input received into the final output of the node

80
Q

Backward propagation

A

The process of adjusting weights in a neural network, to reduce total error of the network, by moving backward through the network’s layers

81
Q

Learning Rate

A

a Parameter that affects the magnitude of adjustments in the weights in a neural network

82
Q

Forward Propagation

A

The process of adjusting weights in a neural network, to reduce total error of the network, by moving forward through the network’s layers

83
Q

Deep Neural Networks

A

Neural networks with many hidden layers, at least 2, but often more than 20

84
Q

Reinforcement Learning

A

Machine learning in which a computer learns from interacting with itself or data generated by the same algorithm

85
Q

3 Characteristics of Big Data

A

1) Volume
2) Variety
3) Velocity

86
Q

Stemming

A

Process of converting inflected forms of a word into its base word (analyzing -> analyz)

87
Q

Lemmatization

A

Process of converting inflected forms of a word into its morphological root (analyzing -> analyze)

88
Q

Bag-of-words

A

A collection of distinct set of tokens from all the texts in a sample dataset, but does not capture the position or sequence of those words

the next step after cleansing data

89
Q

Document Term Matrix (DTM)

A

last step of text processing

uses the BOW

Matrix where each row belongs to a document and each column represents a token

90
Q

N-grams

A

a representation of word sequences, unigram, bigram,trigram etc.

91
Q

False positive rate

A

FP / (TN+FP)

92
Q

True positive rate

A

TP / (TP + FN)

93
Q

When is precision useful

A

Where the cost of FP/Type 1 Error is high

94
Q

When is Recall useful

A

When cost of FN/Type 2 error is high

95
Q

What type of data is best used with SVM models

A

linear data

96
Q

Veracity

A

The accuracy of data

97
Q

Inconsistency Error

A

The data conflicts with what it should be (male in name column), “it doesn’t make sense” data point

98
Q

Non-Uniformity Error

A

Data not presented in same format

99
Q

Extraction

A

New Variable is created using existing data

100
Q

Difference in purpose between feature selection and feature engineering

A

Feature selection minimizes overfitting and feature engineering minimizes undercutting

101
Q

Normalization Formula

A

(value - min) / (max - min)

102
Q

How much should be allocated to training set when there is absence of ground of truth

A

0%, this is unsupervised data set

103
Q

Invalidity Error

A

When the result is outside the meaningful range

104
Q

SEE formula

A

if the relationship between the dependent and independent variables is strong, the SEE will be low

Sq (MSE)
MSE = SSE / n-k-1

105
Q

Formula for T-statistic for correlation coefficient

A

t = (r * sq(n-2)) / (sq(1-r^2))

106
Q

MSE Formula

A

SSE / n-k-1

107
Q

Degrees of freedom for error term

A

n - k - 1

108
Q

MSR formula

A

RSS / k

109
Q

F stat formula

A

MSR / MSE

MSR formula = RSS / k
MSE = SSE / n-k-1

110
Q

how many tails is f test

A

1

111
Q

What does rejection of the null hypothesis of F test mean

A

at least one of the coefficient is significantly different than 0, which is good for explanatory reasons

112
Q

What is the effect of serial correlation

A

Type 1 errors

113
Q

what is the effect of multicollinearity

A

type 2 errors

114
Q

Two categories of supervised learning

A

1) Regression
2) Classification

115
Q

What type of learning is regression and when would it be used

A

If the target variable is continuous (supervised learning)

116
Q

What type of learning is classification and when would it be used

A

If the target variable is categorical or ordinal, such as company rating
(Supervised learning)

117
Q

Two categories of unsupervised learning

A

1) Dimension reduction
2) Clustering

118
Q

What type of learning technique is CART

A

supervised learning

119
Q

What type of variables is CART used to predict

A

EITHER continuous or categorical

120
Q

What type of learning technique is K-means and is it top down or bottom up

A

unsupervised / clustering / bottom up

121
Q

What type of learning technique is principal component analysis and what is it good for

A

unsupervised / provides insight into the volatility contained in a data set

122
Q

What type of learning technique is KNN

A

supervised

123
Q

What type of learning technique is LASSO

A

supervised / regression

124
Q

What is k-fold-cross-validation

A

technique for mitigating excess reduction of the training set size by reshuffling the training set

125
Q

Advantage of using CART over KNN

A

1) CART provides visual
2) CART does not require initial hyper parameters set
3) CART does not require to specify a similarity measure

126
Q

when is model generalization maximized

A

when prediction error on test data is minimzed

127
Q

What is high bias error and high variance error indicative of

A

underfitting

128
Q

which error are linear functions more prone to

A

bias error

129
Q

are linear functions more prone to underfitting or overfitting

A

underfitting

130
Q

which ML technique makes use of root nodes, decision nodes, and terminal nodes

A

CART

131
Q

Durbin Watson for AR(1) models

A

indeterminable

132
Q

What modeling technique can you use on random walk patterns

A

first-differenced regression

133
Q

what is the most common problem with trend models

A

serial correlation

134
Q

when can you not calculate the mean reverting level

A

when x1 is greater than 1

135
Q

Stop word

A

A word that is so common in a text that it carries no meaning

136
Q

Standardization in text processing

A

lowercasing, removing stop words, stemming and lemmatization

137
Q

What problem do stemming and lemmatization address

A

data sparseness and low frequency tokens

138
Q
A