Ensemble Flashcards
How do ensemble methods work?
They work by combining predictions from several estimators built with a given learning algorithm in order to improve generalization.
What kind of methods are used?
- averaging - reduce variance of ‘strong’ estimators
- boosting - reduce bias of ‘weak’ estimators
How does averaging work?
It works by building several estimators independently and averaging their predictions to reduce the variance.
How does boosting work?
It works by building sequentially several estimators such that the combined estimator has a reduced bias.
What is ‘Pasting’ - averaging methods?
Algorithm that uses random subsets of samples drawn randomly from the dataset for its independent estimators
What is ‘Bagging’?
For averaging methods this means that the samples are drawn with replacement.
What is ‘Random Subspaces’ - averaging methods?
Algorithm that uses random subsets of samples drawn as random subsets of features.
What is ‘Random Patches’ - averaging methods?
Algorithm that uses random subsets of samples drawn as random subsets of both data and features.
RandomForests and Extra-Trees - averaging methods
They are combine-and-perturb methods based on constructing randomized decision trees and then averaging their prediction results.
Differences between RandomForests and ExtraTrees
During tree construction:
- in RFs the node split is picked based on the best split among a random subset of features
- in ETs the splits are drawn at random for each candidate feature with the best one being picked for the node split
What is bias?
It is the error from erroneous assumptions in the learning algorithm.
High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
What is variance?
It is error from sensitivity to small fluctuations in the training set.
High variance can cause overfitting: modeling the random noise in the training data, rather than the intended outputs.
What is a decision tree?
It is a model for predicting a dependent variable Y using an independent var X by checking a collection of splits..
What is a decision tree split?
A split is a condition or query on a single independent variable that is either true or false.
Splits are arranged as a tree with 2 child nodes: left for true condition, right for false condition
Boosting intuition
minimizes bias by using estimators with low variance and high bias (i.e. shallow decision trees)
Bagging intuition
minimizes variance by using estimators low bias and high variance (i.e. full decision trees)
What is Gini impurity?
- a measure of how often a randomly chosen element from a set will be classified incorrectly
- the classification is done randomly according to the distribution of the labels in the set
What is information gain?
- a measure given by the difference between the entropy of, for example, the target variable and the entropy of the target variable conditioned by a regressor
IG(T, rgr) = H(T) - H(T | rgr)
What is ‘reduced error’ pruning?
- method of reducing overfitting in decision trees
- ‘bottom-up’ algo
- starting at the leaves, each node is replaced with the most popular class. If prediction accuracy is improved, the change is kept.