Topic 5: Ensemble Methods Flashcards
What does ensemble mean
Having multiple predictive models and combining their predictions
-> treating them as a “committee”
What is a Committee
A group of predictive models
What is the underlying principle of ensemble methods
That in real-world situations, every model has limitations and will make some errors
What models can the ensemble algorithm be applied to
Most
from simple decision trees to deep CNN
What is Jensen’s Inequality
If X is a random variable and f(x) is a convex function, then:
E [ f(X) ] ≥ f (E [X])
Essentially distance of the mean guess x¯ from the truth is less than (or equal to) the average of the distances for each individual guess xi
So the error of combined values is less than the average error of each individual value
What is jensen’s inequality applied to ensemble methods
The squared error of the ensemble is guaranteed to be less than or equal to the average squared error of the individual estimators
What is the relationship between ensemble size and probability of ensemble error
If we fix the individual error to ε = 0.3 (Probability that one model makes an error)
And then increase the ensemble size
The probability of ensemble error decrease
If decrease ε, the whole graph (ensemble size v error is squished in the y axis (moved down))
What is the assumption we are making in ensemble probability models
Each model is independent
What is the issue with increasing the number of model in an ensemble with the same set training data
In order for each model to be independent (necessary requirement), the training data has to be split n times (for n models) - using the assumption that all data is IID
However, decreasing training dataset sizes means each model is less and less accurate on testing data
Why do we need parallel algorithms
To try and create “diversity” between classifiers in Ensembles, whilst maintaining a reasonable degree of accuracy
Eg Bagging and random forests
What is the bagging algorithm
“Bootstrap AGGregating”
Given a datset size n, randomly select n examples WITH replacement
Eg from (n = 1-6) : 1 1 2 4 5 5
This is a ‘bootstrap’
What is a data bootstrap
a random sample of examples from our original dataset
For a classification problem, how do you combine model results
Majority vote
For a regression problem, how do you combine model results
Take the average
How can you use non-uniform weights in the bagging algorithm
Requires an Extra holdout dataset separate form train and test data
Instead of simply averaging the predictions from each model, different weights are assigned to each model’s prediction based on their performance
This data helps assess the performance of the model with different assigned weights
What is the risk of using non-uniform weights in the bagging algorithm
Over fitting
Weights may become overly tuned to specific characteristics
Bagging: are we actually discarding a lot of training data in our random sampling?
No, our classifiers are statistically dependent with each other
Almost all of the dataset is certain to be used