Quant Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Confidence Interval for a Predicted Y-Value

A

coefficient ± (critical t-value)(standard error of forecast)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

t test for each variable

A

est. regression parameter/std

df = n-k-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

R square

coefficient of deterination

A

= RSS/ SST
regression sum of squares/ total sum of squares
=SST- SSE (sum of squared errors)
/ SST

=explained variation/ total variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SEE standard error of estimate

A

= square root of mean squared error (MSE)

MSE = SSE / (N-K-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

RSS sum of squares

A

MSR = RSS / K

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

F test all coefficients collectively

-one tail

A

= MSR regression mean square
/ mean squared error MSE

reject H0 IF F> Critical, at least one of the coefficients is significantly different than zero, at least one of the independent variables in the regression model makes a significant contribution to the explanation of the dependent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

conditional heteroskedasticity

A

residual variance related to level of independent variables

Std error are unreliable, but the slope coefficients are consistent and unbiased.

Use BP chi-square test
use white-corrected std errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Serial correlation

A

residuals are correlated.

Type I errors but the slope coefficients are consistent and unbiased.

Durbin-watson test
use Hansen method to adjust standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Multicollinearity

A

Two or more independent variables are correlated.

Too many type iI errors and the slope coefficients are unreliable.

Drop one of the correlated variables.

Both multicollinearity and serial correlation biases the standard errors of the slope coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

log-linear trend model

In(yt) = bo+b1(t)

A

best for data series that exhibits a trend or for which the residuals are correlated or predictable or the mean is non-constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

AR model

A

dependent variable is regressed against previous values of itself.
is correct if the autocorrelation of residuals from the model are not statistically significant at any lag.

use t-test

if significant, model is incorrectly specified and a lagged variable at the indicated lag should be added.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

covariance stationary

A

meet the following 3 conditions:

  1. constant and finite mean
  2. constant and finite variance
  3. constant and finite covariance with leading or lagged values

use Dickey-fuller test

if AR is not stationary, correct with 1st differencing.
if it is, the mean-reverting level must be defined, b1 must <1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

mean reversion

A

b0/(1-b1)

value of the variable tends to fall when above its mean and rise when below its mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

unit root

A

if the value of the lag coefficient = 1
the time series has a unit root and will follow a random walk process.

uniti root is not covariance stationary.
if it is unit root, value at t = value t-1 + a random error
mean reverting level is undefined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

random walk

A

one for which the value in one period = value in another period + a random error

with a drift = xt = bo + xt-1 + error
without drift = xt-1 + error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

1st differencing

A

to correct autoregressive model

subtract the value of the time series in the immediately preceding period from the current value of the time series to define a new variable

yt = xt - xt-1 (bo=b1=0)
covariance stationary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

seasonality

A

tested by calculating autocorrelation of error terms.

to adjust for seasonality, an additional lag of the variable is added to the original model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

RMSE root mean squared error

A

used to assess the predictive accuracy autoregressive models

the lower the better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

cointegration

A

2 time series are economically linked or follow the same trend and that relationship is not expected to change

the error term is covariance stationary and t-test is reliable.

test for unit root use DF test
if reject null hypothesis of a unit root, the error terms generated by the 2 times series are covariance stationary and the two series are coingegrated.

If both time series are covariance stationary, model is reliable.
If only the dependent variable time series or only the independent time series is covariance stationary, the model is not reliable.
If neither time series is covariance stationary, you need to check for cointegration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

ARCH autoregressive conditional heteroskedasticity

A

describes the condition where the variance of the residuals in one time period within a time series is dependent on the variance of the residuals in another period.

if true, std of regression coefficient in AR model and the hypothesis test of these coefficients are invalid.

use generalized least squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

supervised learning

A

a machine learning technique in which a machine is given labelled input and output data and models the output data based on the input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

unsupervised learning

A

labeled data not provided, use unlabelled data that the algorithm uses to determine the structure of the data

a machine is given input data in which to identify patterns and relationships, but no output data to model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

deep learning algorithms

A

algorithms such as neural networks and reinforced learning learn from their prediction errors and are used for complex tasks such as image recognition and natural language processing.

technique to identify patterns of increasing complexity and may use supervised or unsupervised learning.

> 20 networks
have an agent seeking to max a defined reward given defined constrains.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

overfitting

A

results from having a large # of independent variables, resulting in an overly complex model which may have generalized random noise that improves in-sample forecasting accuracy, but not for out-of sample.

use complexity reduction
- a penalty is imposed to exclude features that are not meaningfully contributing to out-of-sample prediction accuracy

and cross validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

supervised learning algorithms include:

A
  1. penalized regression
  2. support vector machine
  3. k-nearest neighbor
  4. classification and regression tree CART
  5. ensemble learnning (random forest)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

unsupervised machine learning algorithm include:

A
  1. principal components analysis PCA
  2. k-means clustering
  3. hierarchical clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

neutral networks

A

comprises an input layer, hidden layers, and an output layer.

consist of nodes connected by links; learning takes place in the hidden layer nodes, each of which consists of a summation operator and an activation function.

Neural networks with many hidden layers (often more than 20) are known as deep learning nets (DLNs) and used in artificial intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

deep learning nets

A

neutral networks with many hidden layers useful for pattern, speech, and imagine recognization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

reinforcement learning

A

seeks to learn from their own errors maximizing a defined reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

data wrangling

A

data transformation and scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

scaling (normalization & standardization)

A

conversion of data to a common unit of measurement

normalization scales variables between the values of 0 and 1
standardization centers the variables at a mean of 0 and a stf of 1, assumes normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

n-grams

A

technique that defines a taken as a sequence of words and is applied when the sequence is importantt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

bag-of-words (BOW)

A

procedure then collects all the token in a document

collection of a distinct set of tokens from all the texts in a sample dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

evaluate the fit of machine learning algorithm

A

precision (P) = true positives / (false positives + true positives)
=tp/(fp+tp)

recall (R) = true positives / (true positives + false negatives)
=tp/(tp+fn)

accuracy = (true positives + true negatives) / (all positives and negatives)
=(tp+tn)/(all)

F1 score = (2 × P × R) / (P + R)

35
Q

standard error of estimate

A

=square root of (unexplained variation/(n-k-1))

36
Q

sample variance of dependent variable

A

total variation / (n-1)

sample standard deviation = squared root (total variation / (n-1))

37
Q

test statistic t=

A

(bi-b)/si

df = n-k-1

38
Q

simulation is best for

A

continuous risk, accommodates correlated variables
Correlation across risks can be modeled explicitly using simulation.

2 advantages of using simulation in decision making are

1) Better input estimation
2) Simulation yields a distribution for expected value rather than a point estimate.

Simulations will yield great-looking output, even when the inputs are random.

39
Q

Scenario analysis

A

discrete, accommodates corrected variables

40
Q

decision trees

A

discrete, sequential, not accommodates correlated variables

41
Q

structure data analysis steps:

A
  1. conceptualization of model risk
  2. data collection
  3. data preparation and wrangling (cleaning data)
  4. Data exploration
  5. Model training
42
Q

unstructured data analysis steps:

A
  1. text problem formulation
  2. data collection
  3. text preparation and wrangling
  4. text exploration
  5. modeling
43
Q

big data is characterized by

A
  1. volume (quantity, Terabyte)
  2. variety (data sources)
  3. velocity (speed, latency)
  4. Veracity (reliability of data source)
44
Q

feature engineering

A

involves optimizing and improving the selected features; prevent underfitting in the training of the model.

45
Q

feature selectioin

A

involves selecting a subset of tokens in the bow, reduce feature-induced noise.

appropriate feature selection is a key factor in minimizing model overfitting.

46
Q

token

A

process of splitting a given text into separate tokens.

47
Q

K-nearest neighbor (KNN).

A

More commonly used in classification (but sometimes in regression), this technique is used to classify an observation based on nearness to the observations in the training sample.

need to specify hyper parameter.

48
Q

Classification and regression trees (CART).

A

Classification trees are appropriate when the target variable is categorical, and are typically used when the target is binary

provides a visual explanation of the prediction process, compared to other algorithms that are often described as black boxes due to their opacity.

49
Q

Principal component analysis (PCA).

A

Problems associated with too much noise often arise when the number of features in a data set (i.e., its dimension) is excessive

unsupervised machine learning algorithm that reduces highly correlated features into fewer uncorrelated composite variables by transforming the feature covariance matrix.

50
Q

Clustering.

A

Given a data set, clustering is the process of grouping observations into categories based on similarities in their attributes (called cohesion).

51
Q

K-means

A

partitions observations into a fixed number (k) of non-overlapping clusters.
unsupervised

52
Q

Hierarchical clustering

A

Hierarchical clustering is an unsupervised iterative algorithm used to build a hierarchy of clusters.

In an agglomerative (or bottom-up) clustering, we start with one observation as its own cluster and add other similar observations to that group, or form another nonoverlapping cluster. A divisive (or top-down) clustering algorithm starts with one giant cluster, and then it partitions that cluster into smaller and smaller clusters.

53
Q

neural networks (NNs),

A

(also called artificial neural networks, or ANNs) are constructed as nodes connected by links. The input layer consists of nodes with values for the features (independent variables).
values are scaled so that the information from multiple nodes is comparable and can be used to calculate a weighted average.

The nodes that follow the input variables are called neurons because they process the input information.

These neurons comprise a summation operator that collates the information (as a weighted average) and passes it on to a (typically nonlinear) activation function, to generate a value from the input values. This value is then passed forward to other neurons in subsequent hidden layers (a process called forward propagation). A related process, backward propagation, is employed to revise the weights used in the summation operator as the network learns from its errors.

54
Q

Deep Learning Networks (DLNs)

A

Deep learning networks (DLNs) are unsupervised neural networks with many hidden layers (often more than 20). DLNs are often used for image, pattern, and character recognition. The last layer in a DLN calculates the expected probability of an observation belonging to a category, and the observation is assigned to the category with the highest probability. Additional applications of DLNs include credit card fraud detection, autonomous cars, natural language processing, and investment decision-making.

55
Q
Reinforcement Learning (RL)
algorithms
A

have an agent that seeks to maximize a defined reward given defined constraints. The RL agent does not rely on labeled training data, but rather learns based on immediate feedback from (millions of) trials. When applied to the ancient game of Go, DeepMind’s AlphaGo algorithm was able to beat the reigning world champion. The efficacy of RL in investment decision-making is not yet conclusive.

56
Q

constraints that are introduced into simulations used in in risk analysis are:

A

1) book value constraints,
2) earnings and cash flow constraints,
3) market value constraints.

57
Q

Underfitting

A

describes a machine learning model that is not complex enough to describe the data it is meant to analyze.

An underfit model treats true parameters as noise and fails to identify the actual patterns and relationships.

58
Q

overfit (too complex) model

A

will tend to identify spurious relationships in the data. Labelling of input data is related to the use of supervised or unsupervised machine learning techniques.

59
Q

LASSO (least absolute shrinkage and selection operator)

A

is a popular type of penalized regression in which the penalty term comprises summing the absolute values of the regression coefficients.

The more included features, the larger the penalty will be. The result is that a feature needs to make a sufficient contribution to model fit to offset the penalty from including it.

60
Q

Curation

A

is ensuring the quality of data,

by adjusting for bad or missing data.

61
Q

Word clouds

A

are a visualization technique.

62
Q

Support vector machine (SVM)

A

is a linear classifier that aims to seek the optimal hyperplane, i.e. the one that separates the two sets of data points by the maximum margin. SVM is typically used for classification.

supervised ML

63
Q

issues that might prevent a simulation from generating meaningful output include:

A

Ad-hoc specification (rather than specification based on sound analysis) of parameter estimates (i.e. the garbage-in, garbage-out problem),

changing correlations across inputs,
non-stationary distributions,

and real data that does not fit (pre-defined) distributions.

64
Q

Data exploration encompasses

A

exploratory data analysis, feature selection, and feature engineering.

65
Q

Stemming

A

is the process of converting inflected word forms into a base word.

66
Q

Reinforcement learning algorithms

A

involve an agent that will perform actions that will maximize its rewards over time, taking into consideration the constraints of its environment.

67
Q

unsupervised learning algorithms:

A

Dimension reduction

clustering

68
Q

Generalization

A

describes the degree to which, when predicting out-of-sample, a machine learning model retains its explanatory power.

69
Q

Big data is defined as data with

A

high volume, velocity, and variety. Big data often suffers from low veracity, because it can contain a high percentage of meaningless data.

70
Q

precision (P) =

A

true positives / (false positives + true positives)
=tp/(AP)

ratio of correctly predicted positive classes to all predicted positive classes.

71
Q

recall (R)

A

= true positives / (true positives + false negatives)
=tp/(tp+fn)

ratio of correctly predicted positive classes to all actual positive classes.

72
Q

accuracy

A

= (true positives + true negatives) / (all positives and negatives)
=(tp+tn)/(all)

percentage of correctly predicted classes out of total predictions.

73
Q

F1 score

A

= (2 × P × R) / (P + R)

74
Q

Out-of-sample error

A

equals bias error + variance error + base error.

75
Q

Bias error

A

is the extent to which a model fits the training data.

76
Q

Variance error

A

describes the degree to which a model’s results change in response to new data from validation and test samples.

77
Q

Base error

A

comes from randomness in the data.

78
Q

Random forest

A

is a collection of randomly generated classification trees from the same data set.

random forests can mitigate the problem of overfitting.
increase the signal-to-noise ratio.

79
Q

sample variance

A

= sum of (x-x)*(y-y)/(n-1)

=total explained / (N-1)

80
Q

correlation coefficient r

A

covxy/(sxsy)
= sum of ((x-x)(y-y))
/(Sqr root of (x-x)^2
(y-y)^2)

81
Q

A probit model

A

is a qualitative dependant variable which is based on a normal distribution.

82
Q

A logit model is

A

a qualitative dependant variable which is based on the logistic distribution.

83
Q

A discriminant model

A

returns a qualitative dependant variable based on a linear relationship that can be used for ranking or classification into discrete states.