Which machine learning algorithm should I use? Flashcards

1
Q

Dimension reduction

A

Reducing the number of variables under consideration. In many applications, the raw data have very high dimensional features and some features are redundant or irrelevant to the task. Reducing the dimensionality helps to find the true, latent relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Supervised learning

A

Supervised learning algorithms make predictions based on a set of examples.

  • Classification
  • Regression
  • Forecasting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

PCA

A

An unsupervised clustering method which maps the original data space into a lower dimensional space while preserving as much information as possible. The PCA basically finds a subspace that most preserves the data variance, with the subspace defined by the dominant eigenvectors of the data’s covariance matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

CheatSheet

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Linear SVM and kernel SVM

A

When the classes are not linearly separable, a kernel trick can be used to map a non-linearly separable space into a higher dimension linearly separable space.

When most dependent variables are numeric, logistic regression and SVM should be the first try for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unsupervised: Clustering

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Factors to consider in ML algorithm

A
  • The size, quality, and nature of data.
  • The available computational time.
  • The urgency of the task.
  • What you want to do with the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Supervised: Classification

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SVD

A
  • SVD is also widely used as a topic modeling tool, known as latent semantic analysis, in natural language processing (NLP).
  • SVD of a user-versus-movie matrix is able to extract the user profiles and movie profiles which can be used in a recommendation system
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Classification

A

When the data are being used to predict a categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DBSCAN

A

When the number of clusters k is not given, DBSCAN (density-based spatial clustering) can be used by connecting samples through density diffusion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression

A

When predicting continuous values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hierarchical result

A

use hierarchical clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Semi-supervised learning

A

Use unlabeled examples with a small amount of labeled data to improve the learning accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When trying to solve a new ML problem what are the three steps?

A

​​

  1. Define the problem. What problems do you want to solve?
  2. Start simple. Be familiar with the data and the baseline results.
  3. Then try something more complicated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why we need PCA, SVD and LDA

A

We generally do not want to feed a large number of features directly into a machine learning algorithm since some features may be irrelevant or the “intrinsic” dimensionality may be smaller than the number of features

17
Q

Supervised: Regression

A
18
Q

Neural networks and deep learning

A
  • A neural network consists of three parts: an input layer, hidden layers, and an output layer.
  • The number of hidden layers defines the model complexity and modeling capacity.
  • Output layer is a categorical variable, then the neural network is a way to address classification problems.
  • Output layer is a continuous variable, then the network can be used to do regression.
  • Output layer is the same as the input layer, the network can be used to extract intrinsic features.
19
Q

What are [1], [2], [3], and [4]?

A

[1] Unsupervised: Dimensionality Reduction)

[2] Unsupervised: Clustering

[3] Supervised: Regression

[4] Supervised: Classification

20
Q

Considerations when choosing an algorithm

A
  • Accuracy (Phase III)
  • Training time (Phase II)
  • Ease of use (Phase I)
21
Q

What are PCA, SVD and LDA

A

Principal component analysis (PCA)

Singular value decomposition (SVD)

Latent Dirichlet allocation (LDA)

22
Q

Hierarchical clustering

A

Hierarchical partitions can be visualized using a tree structure (a dendrogram). It does not need the number of clusters as an input and the partitions can be viewed at different levels of granularities (i.e., can refine/coarsen clusters) using different K.

23
Q

Perform dimension reduction​

A

Principal component analysis

24
Q

k-means, k-modes, and GMM (Gaussian mixture model) clustering

A
  • Clustering aims to partition n observations into k clusters.
  • K-means define hard assignment: the samples are to be and only to be associated with one cluster.
  • GMM : define soft assignment: Each sample has a probability to be associated with each cluster.
  • Both algorithms are simple and fast enough for clustering when the number of clusters k is given.
25
Q

Linear regression

and

Logistic regression

A
26
Q

LDA, GMM, and NLP

A
  • LDA is a probabilistic topic model and it decomposes documents into topics in a similar way as a Gaussian mixture model (GMM) decomposes continuous data into Gaussian densities.
    • Differently from the GMM, an LDA models discrete data (words in documents) and it constrains that the topics are a prior distributed according to a Dirichlet distribution
27
Q

Forecasting

A

Making predictions about the future based on the past and present data.

28
Q

Reinforcement learning

A

Reinforcement learning analyzes and optimizes the behavior of an agent based on the feedback from the environment. Machines try different scenarios to discover which actions yield the greatest reward, rather than being told which actions to take. Trial-and-error and delayed reward distinguishes reinforcement learning from other techniques.

29
Q

Unsupervised learning

A

The machine is presented with totally unlabeled data. It is asked to discover the intrinsic patterns that underlies the data, such as:

  • a clustering structure,
  • a low-dimensional manifold,
  • a sparse tree and graph.
30
Q

Clustering

A

Grouping a set of data examples so that examples in one group (or one cluster) are more similar (according to some criteria) than those in other groups.

31
Q

Unsupervised: Dimensionality reduction

A
32
Q

Trees and ensemble trees

A
  • Subdivide the feature space into regions with mostly the same label
  • Random Forrest and gradient boosting are two popular ways to use tree algorithms to achieve good accuracy as well as overcoming the over-fitting problem
33
Q

Numeric prediction quickly​

A

​​

  • Decision trees
  • Logistic regression