Ensemble Learning Flashcards
How do we ensure that learners perform differently? Name five methods.
1) Randomly divide original dataset into sub-samples (each learner will have a different subset of data to train on).
2) Randomly select a subset of features (each learner will use different features).
3) Use different control parameter values.
4) Use different training algorithms.
5) Heterogeneous ensembles.
When do we use the following to combine the output from each learner to produce one final prediction?
a) majority voting
b) average
c) median
d) weighted majority vote
a) classification problems
b) regression problems with no outliers
c) regression problems with outliers
d) more complicated problems
What are the two main approaches to ensemble learning?
1) Bagging (or bootstrap aggregating)
2) Boosting
Describe the concept of bagging.
(Hint: SMART)
Subsamples (learners use different randomly selected subsamples from the dataset).
Multiple learners (several models are trained on the same problem).
Agreement policy (final prediction is based on a predefined policy, like voting or averaging).
Regression and classification (applicable to both types of problems).
Train in parallel (models can be trained independently in parallel).
What are the three basic steps of bagging?
1) Create multiple subsets of data from original training data.
2) Build multiple models (classifiers).
3) Combine outputs of classifiers.
What are three approaches to ensure diversity with respect to the predictive behavior of each member in the ensemble?
1) create subsets smaller than original by randomly selecting from original.
2) if original dataset small, create subsets of the same size, and sample with replacement.
3) have a large number of members in ensemble (results in different subsets for diversity).
What is a weak learner?
Also give three examples of weak learners.
A weak learner produces a classifier which is only slightly more accurate than random classification.
A stump is the most common weak learner.
- a stump originates from classification trees.
- it is a tree with one node and two leaves.
- doesn’t make great predictions
- only uses one feature as decision node.
Other weak learners are:
- simple neural networks
- other predictive models.
What are five advantages of AdaBoost?
1) results in non-linear model.
2) has good generalization performance.
3) robust to overfitting.
4) simple to implement
5) works with numerical and nominal features.
What is a weakness of AdaBoost?
Sensitive to outliers (outliers might have an increased weight for next learner).
Once trees in a forest are induced, then the final prediction can be determined by: ____
1) majority voting (classification)
2) weighted majority voting
3) average across trees in forest (regression)
4) median (regression with outliers)
What are four advantages of random forests?
1) performs well on most problems - especially if trees are diverse.
2) can handle missing values and noise.
3) efficient on large number of features.
4) pruning is not needed (no need to worry about overfitting).
What are three weaknesses of random forests?
1) not easy to interpret.
2) rule extraction is not possible.
3) some tuning of parameters might be needed.
Why are decision trees popular for ensembles?
Sensitivity to dataset changes.
How do diverse weak learners in an ensemble boost performance?
By compensating for each other’s weaknesses.
True or False: Bagging is simpler and easier to parallelize than boosting.
True.
Bagging is more user-friendly and faster to train.