Quant Flashcards

Question

supervised learning algorithms include:

Answer 1

1. penalized regression 2. support vector machine 3. k-nearest neighbor 4. classification and regression tree CART 5. ensemble learnning (random forest)

Answer 2

1. principal components analysis PCA 2. k-means clustering 3. hierarchical clustering

Answer 3

comprises an input layer, hidden layers, and an output layer. consist of nodes connected by links; learning takes place in the hidden layer nodes, each of which consists of a summation operator and an activation function. Neural networks with many hidden layers (often more than 20) are known as deep learning nets (DLNs) and used in artificial intelligence.

Answer 4

neutral networks with many hidden layers useful for pattern, speech, and imagine recognization.

Answer 5

seeks to learn from their own errors maximizing a defined reward

Answer 6

data transformation and scaling

Answer 7

conversion of data to a common unit of measurement normalization scales variables between the values of 0 and 1 standardization centers the variables at a mean of 0 and a stf of 1, assumes normal distribution

Answer 8

technique that defines a taken as a sequence of words and is applied when the sequence is importantt

Answer 9

procedure then collects all the token in a document collection of a distinct set of tokens from all the texts in a sample dataset

Answer 10

precision (P) = true positives / (false positives + true positives) =tp/(fp+tp) recall (R) = true positives / (true positives + false negatives) =tp/(tp+fn) accuracy = (true positives + true negatives) / (all positives and negatives) =(tp+tn)/(all) F1 score = (2 × P × R) / (P + R)

Answer 11

=square root of (unexplained variation/(n-k-1))

Answer 12

total variation / (n-1) sample standard deviation = squared root (total variation / (n-1))

Answer 13

(bi-b)/si | df = n-k-1

Answer 14

continuous risk, accommodates correlated variables Correlation across risks can be modeled explicitly using simulation. 2 advantages of using simulation in decision making are 1) Better input estimation 2) Simulation yields a distribution for expected value rather than a point estimate. Simulations will yield great-looking output, even when the inputs are random.

Answer 15

discrete, accommodates corrected variables

Answer 16

discrete, sequential, not accommodates correlated variables

Answer 17

1. conceptualization of model risk 2. data collection 3. data preparation and wrangling (cleaning data) 4. Data exploration 5. Model training

Answer 18

1. text problem formulation 2. data collection 3. text preparation and wrangling 4. text exploration 5. modeling

Answer 19

1. volume (quantity, Terabyte) 2. variety (data sources) 3. velocity (speed, latency) 4. Veracity (reliability of data source)

Answer 20

involves optimizing and improving the selected features; prevent underfitting in the training of the model.

Answer 21

involves selecting a subset of tokens in the bow, reduce feature-induced noise. appropriate feature selection is a key factor in minimizing model overfitting.

Answer 22

process of splitting a given text into separate tokens.

Answer 23

More commonly used in classification (but sometimes in regression), this technique is used to classify an observation based on nearness to the observations in the training sample. need to specify hyper parameter.

Answer 24

Classification trees are appropriate when the target variable is categorical, and are typically used when the target is binary provides a visual explanation of the prediction process, compared to other algorithms that are often described as black boxes due to their opacity.

Answer 25

Problems associated with too much noise often arise when the number of features in a data set (i.e., its dimension) is excessive unsupervised machine learning algorithm that reduces highly correlated features into fewer uncorrelated composite variables by transforming the feature covariance matrix.

Answer 26

Given a data set, clustering is the process of grouping observations into categories based on similarities in their attributes (called cohesion).

Answer 27

partitions observations into a fixed number (k) of non-overlapping clusters. unsupervised

Answer 28

Hierarchical clustering is an unsupervised iterative algorithm used to build a hierarchy of clusters. In an agglomerative (or bottom-up) clustering, we start with one observation as its own cluster and add other similar observations to that group, or form another nonoverlapping cluster. A divisive (or top-down) clustering algorithm starts with one giant cluster, and then it partitions that cluster into smaller and smaller clusters.

Answer 29

(also called artificial neural networks, or ANNs) are constructed as nodes connected by links. The input layer consists of nodes with values for the features (independent variables). values are scaled so that the information from multiple nodes is comparable and can be used to calculate a weighted average. The nodes that follow the input variables are called neurons because they process the input information. These neurons comprise a summation operator that collates the information (as a weighted average) and passes it on to a (typically nonlinear) activation function, to generate a value from the input values. This value is then passed forward to other neurons in subsequent hidden layers (a process called forward propagation). A related process, backward propagation, is employed to revise the weights used in the summation operator as the network learns from its errors.

Answer 30

Deep learning networks (DLNs) are unsupervised neural networks with many hidden layers (often more than 20). DLNs are often used for image, pattern, and character recognition. The last layer in a DLN calculates the expected probability of an observation belonging to a category, and the observation is assigned to the category with the highest probability. Additional applications of DLNs include credit card fraud detection, autonomous cars, natural language processing, and investment decision-making.

Answer 31

have an agent that seeks to maximize a defined reward given defined constraints. The RL agent does not rely on labeled training data, but rather learns based on immediate feedback from (millions of) trials. When applied to the ancient game of Go, DeepMind's AlphaGo algorithm was able to beat the reigning world champion. The efficacy of RL in investment decision-making is not yet conclusive.

Answer 32

1) book value constraints, 2) earnings and cash flow constraints, 3) market value constraints.

Answer 33

describes a machine learning model that is not complex enough to describe the data it is meant to analyze. An underfit model treats true parameters as noise and fails to identify the actual patterns and relationships.

Answer 34

will tend to identify spurious relationships in the data. Labelling of input data is related to the use of supervised or unsupervised machine learning techniques.

Answer 35

is a popular type of penalized regression in which the penalty term comprises summing the absolute values of the regression coefficients. The more included features, the larger the penalty will be. The result is that a feature needs to make a sufficient contribution to model fit to offset the penalty from including it.

Answer 36

is ensuring the quality of data, | by adjusting for bad or missing data.

Answer 37

are a visualization technique.

Answer 38

is a linear classifier that aims to seek the optimal hyperplane, i.e. the one that separates the two sets of data points by the maximum margin. SVM is typically used for classification. supervised ML

Answer 39

Ad-hoc specification (rather than specification based on sound analysis) of parameter estimates (i.e. the garbage-in, garbage-out problem), changing correlations across inputs, non-stationary distributions, and real data that does not fit (pre-defined) distributions.

Answer 40

exploratory data analysis, feature selection, and feature engineering.

Answer 41

is the process of converting inflected word forms into a base word.

Answer 42

involve an agent that will perform actions that will maximize its rewards over time, taking into consideration the constraints of its environment.

Answer 43

Dimension reduction | clustering

Answer 44

describes the degree to which, when predicting out-of-sample, a machine learning model retains its explanatory power.

Answer 45

high volume, velocity, and variety. Big data often suffers from low veracity, because it can contain a high percentage of meaningless data.

Answer 46

true positives / (false positives + true positives) =tp/(AP) ratio of correctly predicted positive classes to all predicted positive classes.

Answer 47

= true positives / (true positives + false negatives) =tp/(tp+fn) ratio of correctly predicted positive classes to all actual positive classes.

Answer 48

= (true positives + true negatives) / (all positives and negatives) =(tp+tn)/(all) percentage of correctly predicted classes out of total predictions.

Answer 49

= (2 × P × R) / (P + R)

Answer 50

equals bias error + variance error + base error.

Answer 51

is the extent to which a model fits the training data.

Answer 52

describes the degree to which a model's results change in response to new data from validation and test samples.

Answer 53

comes from randomness in the data.

Answer 54

is a collection of randomly generated classification trees from the same data set. random forests can mitigate the problem of overfitting. increase the signal-to-noise ratio.

Answer 55

= sum of (x-x)*(y-y)/(n-1) =total explained / (N-1)

Answer 56

covxy/(sxsy) = sum of ((x-x)*(y-y)) /(Sqr root of (x-x)^2*(y-y)^2)

Answer 57

is a qualitative dependant variable which is based on a normal distribution.

Answer 58

a qualitative dependant variable which is based on the logistic distribution.

Answer 59

returns a qualitative dependant variable based on a linear relationship that can be used for ranking or classification into discrete states.