Which machine learning algorithm should I use? Flashcards
Dimension reduction
Reducing the number of variables under consideration. In many applications, the raw data have very high dimensional features and some features are redundant or irrelevant to the task. Reducing the dimensionality helps to find the true, latent relationship.
Supervised learning
Supervised learning algorithms make predictions based on a set of examples.
- Classification
- Regression
- Forecasting
PCA
An unsupervised clustering method which maps the original data space into a lower dimensional space while preserving as much information as possible. The PCA basically finds a subspace that most preserves the data variance, with the subspace defined by the dominant eigenvectors of the data’s covariance matrix.
CheatSheet
Linear SVM and kernel SVM
When the classes are not linearly separable, a kernel trick can be used to map a non-linearly separable space into a higher dimension linearly separable space.
When most dependent variables are numeric, logistic regression and SVM should be the first try for classification.
Unsupervised: Clustering
Factors to consider in ML algorithm
- The size, quality, and nature of data.
- The available computational time.
- The urgency of the task.
- What you want to do with the data.
Supervised: Classification
SVD
- SVD is also widely used as a topic modeling tool, known as latent semantic analysis, in natural language processing (NLP).
- SVD of a user-versus-movie matrix is able to extract the user profiles and movie profiles which can be used in a recommendation system
Classification
When the data are being used to predict a categorical variable
DBSCAN
When the number of clusters k is not given, DBSCAN (density-based spatial clustering) can be used by connecting samples through density diffusion.
Regression
When predicting continuous values
Hierarchical result
use hierarchical clustering
Semi-supervised learning
Use unlabeled examples with a small amount of labeled data to improve the learning accuracy.
When trying to solve a new ML problem what are the three steps?
- Define the problem. What problems do you want to solve?
- Start simple. Be familiar with the data and the baseline results.
- Then try something more complicated.