3 (QM) - Machine Learning Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is machine learning (ML)?

A

The use of algorithms to make decisions by generalizing (or finding patterns) in a given data set. The goal is to use data to automate decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Target Variable

A

The dependent variable (i.e. the “Y” variable). Can be continuous, categorical, or ordinal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Features

A

These are the independent variables (i.e. the “X” variables).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Training data set

A

The sample data set used to fit the ML-model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hyperparameter

A

The ML-model input specified by the researcher.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Supervised Learning

A

A ML-algorithm uses labeled training data (inputs and outputs are identified) to model relationships in the data and achieve superior forecasting accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Unsupervised Learning

A

An ML algorithm is not given labeled training data. Instead, the inputs (i.e. features) are provided without any conclusions around those inputs and the algorithm aims to determine the structure of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Deep Learning

A

An ML-algorithm that is used for complex tasks such as image recognition, natural language processing, and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the three major types of deep learning algorithms?

A
  1. Reinforced learning algorithms (RL) - An ML model that learns from their own prediction errors. Learn from their errors to maximize a defined reward.
  2. Neural Networks - Comprise an input layer, hidden layers (which process the input), and an output layer. The nodes in the hidden layer are called neurons, which comprise a summation operator (that calculates a weighted average) and an activation function (a nonlinear function).
  3. Deep learning networks (DLN) - Neural networks with many hidden layers (20+) that are useful for pattern, speech, and image recognition.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Overfitting

Do overfit models generalize well to new data?

A

An issue with supervised ML that results when a large number of features (i.e. independent variables) are included in the data sample. This results in an overly complex model that may have generalized random noise that improves in-sample forecasting, however, it decreases the accuracy of the model to forecast other out-of-sample data.

No, they do not generalize well to new data. This results in a low out-of-sample R-squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bias error

A

The in-sample error that results from models with a poor fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variance error

A

The out-of-sample error that results from overfitted models that do not generalize well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Base error

A

The residual errors due to random noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Training sample
v.
Validation sample
v.
Test sample

What are these three data sets used for?

What are the types of errors associated with each data set?

A

These data sets are used to measure how well a model generalizes. All three datasets are nonoverlapping.

Training - Data set used to develop the model. In-sample prediction errors.

Validation - Data set used for tuning the model. Out-of-sample prediction errors.

Test - Data set used to evaluate the model using new data. Out-of-sample prediction errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Learning curve

A

Curve that plots the accuracy rate (i.e. 1 - error rate) in the validation or text sample versus the size of the training sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Accuracy Rate =

A

Accuracy Rate = 1 - Error Rate

17
Q

What two methods to data scientists use to mitigate the problem of overfitting?

A
  1. Complexity Reduction - A penalty is imposed to exclude features that aren’t meaningfully contributing to out-of-sample prediction accuracy. The penalty value increases with the number of independent variables used by the model.
  2. Cross Validation - A sampling technique that estimates out-of-sample error rates directly from the validation sample. This ensures the validation sample is both large and representative of the population, just like the training sample.

K-fold cross validation is a method of randomly dividing a dat aset into any number of parts

18
Q

What method is used to collect average in-sample and out-of-sample error rates?

A

K-fold cross validation. Randomly dividing a data set into any number of parts - “k”.

The training sample comprises k-1 parts with one part left for validation. Error rates are the measures for the model in each of the parts. This process is then repeated “K” times.

19
Q

Penalized regression

A

Supervised learning

An algorithm that reduces overfitting by imposing a penalty on or reducing the nonperforming features.

20
Q

Support vector machine (SVM)

A

Supervised learning

A linear classification algorithm that separates the data into one of two possible classifiers based on the model-defined hyperplane.

21
Q

K-nearest neighbor (KNN)

A

Supervised learning

An algorithm used to classify an observation based on nearness to the observations in the training sample.

22
Q

Classification and regression tree (CART)

A

Supervised learning

An algorithm is used for classifying categorical target variables when there are significant nonlinear relationships among variables.

23
Q

Ensemble learning

A

Supervised learning

An algorithm that combined predictions from multiple models, resulting in a lower average error rate.

24
Q

Random forest

A

Supervised learning

A variant of the classification and regression tree (CART) whereby a large number of classification tress are trained using data bagged from the same data set.

25
Q

Principal components analysis (PCA)

A

Unsupervised learning

An algortihm that summarized the information in a large number of correlated factors into a much smaller set of uncorrelated factors, called eigenvectors.

26
Q

K-means clustering

A

Unsupervised learning

An algorithm that partitions observations into “K” nonoverlapping clusters; a centroid is associated with each cluster.

27
Q

Hierarchical clustering

A

Unsupervised learning

An algorithm that builds a hierarchy of clusters without any predefined number of clusters.

28
Q

Hierarchical clustering

Agglomerative vs. divisive

A

method of building a hierarchy of clusters without any predefined number of clusters.

Agglomerative - Bottom-up. starts with one observation and its own cluster and adds similar observations to that group

Divisive - Top-down. Starts with one giant cluster and then partitions that cluster into smaller and smaller clusters.