Quant- Machine Learning Flashcards

Question 1

Q

Neural networks

Answer

A

Include highly flexible ML algorithms that have been successfully applied to a variety of tastes characterized by non linearities and interactions amount features.

The foundation for deep learning and reinforcement learning.

Question 2

Q

K-fold cross validation

Answer

A

Technique for mitigating the holdout sample problem (excessive dedication of the training set size)

Validation technique in which the data are shuffled randomly and then are divided into k equal sub samples with k-1 samples used as training samples and one sample the k-th used as a validation sample.

Repeat multiple times to minimize bias and variance

Question 3

Q

Penalized regression

Answer

A

Includes a constraint such that the regression coefficients are chosen to minimize the sum of squared residuals plus a penalty term that increases in size with the number of included features. Therefore a feature must make a sufficient contribution to model fit to offset the penalty from including it.

Question 4

Q

LASSO

Answer

A

Type of penalized regression / regularization technique

Stands for least absolute shrinkage and selection operator

LASSO also minimizes the sum of the absolute values of the regression coefficients and eliminates the least important features from the model

Equation includes Lambda which is hyper-parameter whose value is set by research

Question 5

Q

Support Vector Machine (SVM)

Answer

A

Supervised algorithm used for classification, regression, and outlier detection

Determines the hyper plane that optimally separates the observations into two sets of data points

Question 6

Q

K-nearest neighbour

Answer

A

To classify a new observation by finding similarities between the new observation and the existing data

Question 7

Q

Classification and regression tree (CART)

Answer

A

Applied to predict either a categorical target variable, producing a classification tree or a continuous target variable, producing a regression tree

Question 8

Q

What are the two categories of supervised learning?

Answer

A

Regression- if the target variable is predicted to be continuous
Classification- if the target variable is categorical or ordinal

Question 9

Q

Describe the difference between bias error and variance error

Answer

A

Bias error- the degree to which a model fits the training data
Variance error- how much a models results change in response to new data from validation and test samples

Out of sample error = bias + variance + base error (randomness in the data)

Question 10

Q

Random forest classifier

Answer

A

A collection of many different decision trees generated by a bagging method or by randomly reducing the number of features available during training

Question 11

Q

Principals components analysis (PCA)

Answer

A

An unsurprised ML algorithm that reduces highly correlated features into fewer uncorrelated composite variables by transforming the feature covariance matrix

Question 12

Q

K-means

Answer

A

Unsupervised ML algorithm that partitions observations into a fixed number of non overlapping clusters. Each cluster is characterized by its centroid and each observation belongs to the cluster with the centroid to which that observation is closest

Question 13

Q

Neural networks

Answer

A

Consist of nodes connected by links

They have three types of layers: an input layer, hidden layers, and an output layer. Learning takes place in the hidden layer nodes, each of which consists of a summation operator and an activation function.

There are usually at least 3 hidden layers (also called deep learning nets) which are the backbone of artificial intelligence

Question 14

Q

Machine learning

Answer

A

Machine learning aims at extracting knowledge from large amounts of data by learning from known examples to determine an underlying structure in the data. The emphasis is on generating structure or predictions without human intervention. An elementary way to think of ML algorithms is to “find the pattern, apply the pattern.”

Question 15

Q

Unsupervised Learning

Answer

A

-algorithms are trained with no labeled data, so
they must infer relations between features, summarize them, or present an interesting underlying structure in their distributions that has not been explicitly provided
-two important types of problems well suited to unsupervised ML are dimension reduction and clustering.

Question 16

Q

Deep Learning

Answer

Study These Flashcards

A

Another category of ML algorithm includes deep learning (based on neural networks) in which a computer learns from interacting with itself. Sophisticated algorithms address such highly complex tasks as image classification, face recognition, speech recognition and natural language processing, and reinforcement learning.

Question 17

Q

Generalization

Answer

Study These Flashcards

A

-describes the degree to which an ML model retains its explanatory power when predicting out-of-sample.
-overfitting, a primary reason for lack of generalization, is the tendency of ML algorithms to tailor models to the
training data at the expense of generalization to new data points.

Question 18

Q

What are the 4 types of model error?

Answer

Study These Flashcards

A

Bias error is the degree to which a model fits the training data.
Variance error describes how much a model’s results change in response to new data from
validation and test samples
Base error is due to randomness in the data
Out-of-sample error equals bias error plus variance error plus base error.

Question 19

Q

Describe agglomerative (bottom up) and divisive (top down) hierarchical clustering

Answer

Study These Flashcards

A

■ Agglomerative hierarchical clustering begins with each observation being its own cluster. Then, the algorithm finds the two closest clusters, defined by some measure of distance, and combines them into a new, larger cluster. This process is repeated until all observations are clumped into a single cluster.
■ Divisive hierarchical clustering starts with all observations belonging to a single cluster. The observations are then divided into two clusters based on some measure of distance. The algorithm then progressively partitions the intermediate clusters into smaller clusters until each cluster contains only one observation.

Question 20

Q

What are the first four steps for textual ML model building?

Answer

Study These Flashcards

A

Text problem formulation
Text curation
Text preparation and wrangling
Text exploration are typically necessary.

Question 21

Q

Model selection is governed by what 3 factors?

Answer

Study These Flashcards

A

whether the data project involves labeled data (supervised learning) or unlabeled data (unsupervised
learning)
the type of data (numerical, continuous, or categorical; text data; image data; speech data; etc.);
the size of the dataset.

Quant- Machine Learning Flashcards

(21 cards)