Ensembles Flashcards
Two characteristics of good ensembles
- Individual models should be strong
2. The correlation between the models in the ensemble should be weak (diversity)
What is bagging?
- Trains N models in parallel using bootstrapped data samples, from an overall training set
- Aggregates using majority voting
- Bootstrapped aggregating = bagging
Bayes Optimal Ensemble
- An ensemble of all hypotheses in the hypothesis space
- On average, no other ensemble can outperform it
- not possible to practically implement
- Tom Mitchell book (1997)
What some practical approaches to ensembling?
- Bagging
- Random Forests
- Boosting
- Gradient Boosting
- Stacking
Subagging
Bagging, but where we sample without replacement
- Used in large datasets where we want to create bootstrap samples that are smaller than the original dataset
What type of learning algorithms are suited to bagging?
- Decision Trees
- DTs are very sensitive to changes and so small changes in a dataset can result in a different feature being selected to split the dataset at the root, or high up in the tree and this can have a ripple effect throughout the subtrees under this node
What is subspace sampling?
- A bootstrap sample only uses a randomly selected subset of the descriptive features in the dataset
- Encourages further diversity of the trees within the ensemble
- Has the advantage of reducing training time for each tree
Random Forests
= Bagging + Subspace sampling
Advantages of Bagging over Boosting
- Simpler to implement + parallelize
- Ease of use
- Reduced training time
However, Caruana et al. (2008) showed that boosted DT ensembles performed best for datasets < 4000 desc. features.
> 4000 descriptive features -> random forests (based on bagging) performed better
> Boosted ensembles may be prone to overfitting and in domains with large numbers of features, overfitting becomes a serious probelm
Costs of using ensembles
- Increased model complexity
2. Increasing learning
What types of algorithms does Bagging work well for?
Unstable algos - algos whose output classifier undergoes major changes in response to small changes in the training data
Examples of unstable classifiers
DTs, NNs and rule-learning algos are all unstable
Examples of stable classifiers
- Linear regression
- Nearest neighbour
- Linear threshold algorithm
Bootstrap replicate
Training set created from bagging procedure
Contains on average 63.2% of the original training set, with several training examples appearing multiple times (Diettrich paper)
Methods for manipulating training datasets
- Bagging
- K-fold cross-validation (ensembles constructed in this way are called ‘cross-validation committees’)
- Boosting
AdaBoost
Freund & Schapire (1995-8)
- manipulates the training examples to generate multiple hypotheses
- maintains a set of weights over the the training examples
- Effect of the change in weights is to place more weight on training examples that were misclassified by h(l) and less weight on examples that were correctly classified
- In subsequent iterations, therefore, Adaboost constructs progressively more difficult learning problems
- final classifier is constructed by a weighted vote of the individual classifiers. Each classifier is weighted (by w(l)) according to its accuracy on the weighted training set that it was trained on
- can be viewed as trying to maximise the margin (confidence of accuracy) on the training data
- constructs each new DT to eliminate ‘residual errors’ that have not been properly handled by the weighted vote of the previously-constructed trees.
- thus, it is directly trying to optimise the weighted vote
and is therefore making a direct attack on the representational problem - directly optimising an ensemble can increase the rik of overfitting
- in high-noise cases, Adaboost puts a large amount of weight on the mislabelled examples and this leads to it overfitting badly
Bagging vs AdaBoost
Diettrich paper: AdaBoost typically outperforms Bagging but when 20% artificial classification noise was added, AdaBoost overfit the data badly while Bagging did very well in the presence of noise.