Machine Learning Flashcards
Supervised learning
labeled training data to guide the ML program toward superior forecasting accuracy
Unsupervised learning
the ML program is not given labeled training data
Deep learning
used for complex tasks such as image recognition, natural language processing, and so on
reinforced learning
Programs that learn from their own prediction errors
neural networks
a group of ML algorithms applied to problems with significant nonlinearities
Supervised Learning Types
Regression (Continuous)
Classification (Categorical)
Neural Networks
Deep Learning
Reinforcement LEarning
Unsupervised Learning Types
Dimensionality Reduction
Clustering
Neural Networks
Deep Learning
Reinforcement LEarning
Overfitting
when a large number of features are included in the data sample
cross validation
estimates out-of-sample error rates directly from the validation sample.
To measure how well a model generalizes, data analysts create three nonoverlapping data sets
(1) training sample
(2) validation sample
(3) test sample
Data scientists then decompose these errors into the following:
- Bias error.
- Variance error.
- Base error.
Bias error.
This is the in-sample error resulting from models with a poor fit.
Variance error.
This is the out-of-sample error resulting from overfitted models that do not generalize well.
Base error.
These are residual errors due to random noise.
k-fold cross validation
the sample is randomly divided equally into k parts. The training sample comprises (k − 1) parts, with one part left for validation. Error is then measured for the model in each of the parts.
common supervised ML algorithms
Penalized regressions.
Support vector machine (SVM).
K-nearest neighbor (KNN).
Classification and regression trees (CART).
Ensemble and Random Forest.
Penalized regressions
reduce the problem of overfitting by imposing a penalty based on the number of features used by the model.
Least absolute shrinkage and selection operator (LASSO).
popular penalized regression model. In addition to minimizing SSE, LASSO minimizes the sum of the absolute values of the slope coefficients.
Support vector machine (SVM)
linear classification algorithm that separates the data into one of two possible classifiers (e.g., sell vs. buy).
K-nearest neighbor (KNN)
used to classify an observation based on nearness to the observations in the training sample
Classification and regression trees (CART)
Classification trees assign observations to one of two possible classifications at each node.
Ensemble and Random Forest.
Ensemble learning is the technique of combining predictions from multiple models rather than a single model.
Random forest is a variant of classification trees whereby a large number of classification trees are trained using data bagged from the same data set.
common unsupervised ML algorithms
Principal component analysis (PCA).
Clustering.
Principal component analysis (PCA)
summarizes the information in a large number of correlated factors into a much smaller set of uncorrelated factors.
Clustering.
clustering is the process of grouping observations into categories based on similarities in their attributes (called cohesion)
K-means clustering
partitions observations into k nonoverlapping clusters, where k is a hyperparameter (i.e., set by the researcher).
Hierarchical clustering
builds a hierarchy of clusters without any predefined number of clusters
agglomerative (or bottom-up) clustering
start with one observation as its own cluster and add other similar observations to that group, or form another nonoverlapping cluster.
divisive (or top-down) clustering
starts with one giant cluster, and then it partitions that cluster into smaller and smaller clusters.
Neural Networks
constructed as nodes connected by links
Deep learning networks (DLNs)
neural networks with many hidden layers
Reinforcement learning (RL)
have an agent that seeks to maximize a defined reward given defined constraints