3 (QM) - Machine Learning Flashcards by Ryan Newson

What is machine learning (ML)?

The use of algorithms to make decisions by generalizing (or finding patterns) in a given data set. The goal is to use data to automate decision-making.

How well did you know this?

Not at all

Perfectly

Target Variable

The dependent variable (i.e. the “Y” variable). Can be continuous, categorical, or ordinal.

How well did you know this?

Not at all

Perfectly

Features

These are the independent variables (i.e. the “X” variables).

How well did you know this?

Not at all

Perfectly

Training data set

The sample data set used to fit the ML-model.

How well did you know this?

Not at all

Perfectly

Hyperparameter

The ML-model input specified by the researcher.

How well did you know this?

Not at all

Perfectly

Supervised Learning

A ML-algorithm uses labeled training data (inputs and outputs are identified) to model relationships in the data and achieve superior forecasting accuracy.

How well did you know this?

Not at all

Perfectly

Unsupervised Learning

An ML algorithm is not given labeled training data. Instead, the inputs (i.e. features) are provided without any conclusions around those inputs and the algorithm aims to determine the structure of the data.

How well did you know this?

Not at all

Perfectly

Deep Learning

An ML-algorithm that is used for complex tasks such as image recognition, natural language processing, and so on.

How well did you know this?

Not at all

Perfectly

What are the three major types of deep learning algorithms?

Reinforced learning algorithms (RL) - An ML model that learns from their own prediction errors. Learn from their errors to maximize a defined reward.
Neural Networks - Comprise an input layer, hidden layers (which process the input), and an output layer. The nodes in the hidden layer are called neurons, which comprise a summation operator (that calculates a weighted average) and an activation function (a nonlinear function).
Deep learning networks (DLN) - Neural networks with many hidden layers (20+) that are useful for pattern, speech, and image recognition.

How well did you know this?

Not at all

Perfectly

Overfitting

Do overfit models generalize well to new data?

An issue with supervised ML that results when a large number of features (i.e. independent variables) are included in the data sample. This results in an overly complex model that may have generalized random noise that improves in-sample forecasting, however, it decreases the accuracy of the model to forecast other out-of-sample data.

No, they do not generalize well to new data. This results in a low out-of-sample R-squared.

How well did you know this?

Not at all

Perfectly

Bias error

The in-sample error that results from models with a poor fit.

How well did you know this?

Not at all

Perfectly

Variance error

The out-of-sample error that results from overfitted models that do not generalize well.

How well did you know this?

Not at all

Perfectly

Base error

The residual errors due to random noise.

How well did you know this?

Not at all

Perfectly

Training sample
v.
Validation sample
v.
Test sample

What are these three data sets used for?

What are the types of errors associated with each data set?

These data sets are used to measure how well a model generalizes. All three datasets are nonoverlapping.

Training - Data set used to develop the model. In-sample prediction errors.

Validation - Data set used for tuning the model. Out-of-sample prediction errors.

Test - Data set used to evaluate the model using new data. Out-of-sample prediction errors.

How well did you know this?

Not at all

Perfectly

Learning curve

Curve that plots the accuracy rate (i.e. 1 - error rate) in the validation or text sample versus the size of the training sample.

How well did you know this?

Not at all

Perfectly

Accuracy Rate =

Study These Flashcards

Accuracy Rate = 1 - Error Rate

What two methods to data scientists use to mitigate the problem of overfitting?

Study These Flashcards

Complexity Reduction - A penalty is imposed to exclude features that aren’t meaningfully contributing to out-of-sample prediction accuracy. The penalty value increases with the number of independent variables used by the model.
Cross Validation - A sampling technique that estimates out-of-sample error rates directly from the validation sample. This ensures the validation sample is both large and representative of the population, just like the training sample.

K-fold cross validation is a method of randomly dividing a dat aset into any number of parts

What method is used to collect average in-sample and out-of-sample error rates?

Study These Flashcards

K-fold cross validation. Randomly dividing a data set into any number of parts - “k”.

The training sample comprises k-1 parts with one part left for validation. Error rates are the measures for the model in each of the parts. This process is then repeated “K” times.

Penalized regression

Study These Flashcards

Supervised learning

An algorithm that reduces overfitting by imposing a penalty on or reducing the nonperforming features.

Support vector machine (SVM)

Study These Flashcards

Supervised learning

A linear classification algorithm that separates the data into one of two possible classifiers based on the model-defined hyperplane.

K-nearest neighbor (KNN)

Study These Flashcards

Supervised learning

An algorithm used to classify an observation based on nearness to the observations in the training sample.

Classification and regression tree (CART)

Study These Flashcards

Supervised learning

An algorithm is used for classifying categorical target variables when there are significant nonlinear relationships among variables.

Ensemble learning

Study These Flashcards

Supervised learning

An algorithm that combined predictions from multiple models, resulting in a lower average error rate.

Random forest

Study These Flashcards

Supervised learning

A variant of the classification and regression tree (CART) whereby a large number of classification tress are trained using data bagged from the same data set.

Principal components analysis (PCA)

Unsupervised learning An algortihm that summarized the information in a large number of correlated factors into a much smaller set of uncorrelated factors, called eigenvectors.

K-means clustering

Unsupervised learning An algorithm that partitions observations into "K" nonoverlapping clusters; a centroid is associated with each cluster.

Hierarchical clustering

Unsupervised learning An algorithm that builds a hierarchy of clusters without any predefined number of clusters.

Hierarchical clustering Agglomerative vs. divisive

method of building a hierarchy of clusters without any predefined number of clusters. Agglomerative - Bottom-up. starts with one observation and its own cluster and adds similar observations to that group Divisive - Top-down. Starts with one giant cluster and then partitions that cluster into smaller and smaller clusters.

3 (QM) - Machine Learning Flashcards

(28 cards)