Machine Learning Flashcards
Supervised learning
labeled training data to guide the ML program toward superior forecasting accuracy
Unsupervised learning
the ML program is not given labeled training data
Deep learning
used for complex tasks such as image recognition, natural language processing, and so on
reinforced learning
Programs that learn from their own prediction errors
neural networks
a group of ML algorithms applied to problems with significant nonlinearities
Supervised Learning Types
Regression (Continuous)
Classification (Categorical)
Neural Networks
Deep Learning
Reinforcement LEarning
Unsupervised Learning Types
Dimensionality Reduction
Clustering
Neural Networks
Deep Learning
Reinforcement LEarning
Overfitting
when a large number of features are included in the data sample
cross validation
estimates out-of-sample error rates directly from the validation sample.
To measure how well a model generalizes, data analysts create three nonoverlapping data sets
(1) training sample
(2) validation sample
(3) test sample
Data scientists then decompose these errors into the following:
- Bias error.
- Variance error.
- Base error.
Bias error.
This is the in-sample error resulting from models with a poor fit.
Variance error.
This is the out-of-sample error resulting from overfitted models that do not generalize well.
Base error.
These are residual errors due to random noise.
k-fold cross validation
the sample is randomly divided equally into k parts. The training sample comprises (k − 1) parts, with one part left for validation. Error is then measured for the model in each of the parts.