Ensenble Methods Flashcards
What are ensemble methods?
Ensemble Methods
Generate a set of classifiers from the training data
Predict class label of previously unseen cases by aggregating predictions made by multiple classifiers
Majority vote for classification
Averaging for regressio
What is the Bagging?
Bootstrap Aggregation
• Training
– Given a set D of d tuples, generate k training dataset Di using bootstrap
– Compute k models Mi using the training sets Di
• Predicting (classify an unknown sample x)
– Each classifier Mi computes its prediction for x
– The bagged classifier M* return the class predicted by the majority of the models
– When class values are -1 and 1, the output of the ensemble can be computed as
Models may be weighted differently
based on their estimated performance
Bagging can be applied to regression
by simply averaging the models’ output
Bagging works because it reduces
variance by voting/averaging
Usually, the more classifiers the better
It can help a lot if data are noisy
however, in some pathological hypothetical situations the overall error might increase
What are stable and unstable classifiers and how do they relate to bagging method?
• We say that a classifier is unstable when small changes in the training set
can cause large changes to the model
• If the learning algorithm is unstable, then bagging almost always improves performance
• Bagging stable classifiers is not a good idea
• Decision/regression trees, linear regression, neural networks are examples of unstable classifiers.
• K-nearest neighbors (with larger k), models with strong regularization are examples
What are random forests?
Random forests are ensembles of unpruned decision tree
learners with randomized selection of features at each split
• Random forests (RF) are a combination of tree predictors
• Each tree depends on the values of a random vector sampled independently and with the same
distribution for all trees in the forest
• The generalization error of a forest of tree classifiers depends on
– The strength of the individual trees in the forest (how good are they on average?)
– The correlation between them (how much group-think is there in the predictions)
• Using a random selection of features to split each node yields error rates that are more robust with respect to noise
How does boosting works?
• Weights are assigned to each training example
• A series of k classifiers is iteratively learned
• After a classifier Mi is learned, the weights are updated
• The next classifier Mi+1 will focus more on the training tuples that were misclassified by Mi
• The final M* is a weighted sum of all the classifiers’ outputs
What is gradient tree boosting?
• Build a sequence of tree predictors by repeating three simple steps
1. Learn a basic predictor
2. Compute the gradient of a loss function with respect to the predictor
3. Compute a model to predict the residual
4. Updated the predictor with the new model
5. Goto 2
• The predictor is increasingly accurate and increasingly complex
What is the stacked generalization method?
Suppose to have several models that are
all skillful on a problem in their own way
Instead of choosing one model to trust,
stacking gives you a way to combine them
Ensemble technique with two layers of classifiers. The first layer is composed of
K base classifiers which are trained independently on the entire training data D.
The base classifiers should be complementary to each other as much as possible
so that they perform well on different subsets of the input space.
The second layer comprises a combiner classifier C that is trained on the
predicted classes from the base classifiers.
The combiner automatically learns how to combine the outputs of the base
classifiers to ma ke the final prediction for a given input