Machine Learning Flashcards

1
Q

Supervised learning

A

labeled training data to guide the ML program toward superior forecasting accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Unsupervised learning

A

the ML program is not given labeled training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Deep learning

A

used for complex tasks such as image recognition, natural language processing, and so on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

reinforced learning

A

Programs that learn from their own prediction errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

neural networks

A

a group of ML algorithms applied to problems with significant nonlinearities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Supervised Learning Types

A

Regression (Continuous)
Classification (Categorical)
Neural Networks
Deep Learning
Reinforcement LEarning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Unsupervised Learning Types

A

Dimensionality Reduction
Clustering
Neural Networks
Deep Learning
Reinforcement LEarning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Overfitting

A

when a large number of features are included in the data sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

cross validation

A

estimates out-of-sample error rates directly from the validation sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

To measure how well a model generalizes, data analysts create three nonoverlapping data sets

A

(1) training sample
(2) validation sample
(3) test sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data scientists then decompose these errors into the following:

A
  • Bias error.
  • Variance error.
  • Base error.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bias error.

A

This is the in-sample error resulting from models with a poor fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variance error.

A

This is the out-of-sample error resulting from overfitted models that do not generalize well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Base error.

A

These are residual errors due to random noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

k-fold cross validation

A

the sample is randomly divided equally into k parts. The training sample comprises (k − 1) parts, with one part left for validation. Error is then measured for the model in each of the parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

common supervised ML algorithms

A

Penalized regressions.

Support vector machine (SVM).

K-nearest neighbor (KNN).

Classification and regression trees (CART).

Ensemble and Random Forest.

17
Q

Penalized regressions

A

reduce the problem of overfitting by imposing a penalty based on the number of features used by the model.

18
Q

Least absolute shrinkage and selection operator (LASSO).

A

popular penalized regression model. In addition to minimizing SSE, LASSO minimizes the sum of the absolute values of the slope coefficients.

19
Q

Support vector machine (SVM)

A

linear classification algorithm that separates the data into one of two possible classifiers (e.g., sell vs. buy).

20
Q

K-nearest neighbor (KNN)

A

used to classify an observation based on nearness to the observations in the training sample

21
Q

Classification and regression trees (CART)

A

Classification trees assign observations to one of two possible classifications at each node.

22
Q

Ensemble and Random Forest.

A

Ensemble learning is the technique of combining predictions from multiple models rather than a single model.

Random forest is a variant of classification trees whereby a large number of classification trees are trained using data bagged from the same data set.

23
Q

common unsupervised ML algorithms

A

Principal component analysis (PCA).

Clustering.

24
Q

Principal component analysis (PCA)

A

summarizes the information in a large number of correlated factors into a much smaller set of uncorrelated factors.

25
Q

Clustering.

A

clustering is the process of grouping observations into categories based on similarities in their attributes (called cohesion)

26
Q

K-means clustering

A

partitions observations into k nonoverlapping clusters, where k is a hyperparameter (i.e., set by the researcher).

27
Q

Hierarchical clustering

A

builds a hierarchy of clusters without any predefined number of clusters

28
Q

agglomerative (or bottom-up) clustering

A

start with one observation as its own cluster and add other similar observations to that group, or form another nonoverlapping cluster.

29
Q

divisive (or top-down) clustering

A

starts with one giant cluster, and then it partitions that cluster into smaller and smaller clusters.

30
Q

Neural Networks

A

constructed as nodes connected by links

31
Q

Deep learning networks (DLNs)

A

neural networks with many hidden layers

32
Q

Reinforcement learning (RL)

A

have an agent that seeks to maximize a defined reward given defined constraints

33
Q
A