Section 6 Tree Based Methods Flashcards
Explain the root node
The root node is the topmost decision node in a tree, it’s the first decision and it corresponds to the best predictor.
Whats a decision node vs a leaf node
A decision node has two or more branches. These correspond to regions of an input variable.
A leaf node represents a classification statement.: Observations are classified according to the majority class.- terminal node
Explain how classification works for classification trees - what criteria are used
The classification of a data-point within a region (usually a terminal node) is given by the majority class (by the MAP rule). Want the regions to be as pure as possible.
Different metrics measure the purity of a region: Entropy and Gini Index
Explain entropy
Entropy quantifies the variance of a probability distribution. 0 means there is no uncertainty in the probability distribution of the region.
Explain gini index
Gini Index quantifies the homogeneity of a probability distribution. Larger means less homogeneous. 0 means all probability mass is assigned to one class.
What is the optimisation procedure for a classification tree?
Growing a tree corresponds to identify the optimal tree structure which involves minimising a function to maximise the homogeneity of the input-space regions. A greedy algorithm is used to implement this minimisation problem
Explain a greedy algorithm
These greedy search approaches to the tree building problem work by fixing node by node from the root down rather than attempting to simultaneously estimate the whole tree structure.
How can overfitting theoretically occur in a classification tree
In theory one could grow a tree until every leaf contains a single observation. However, this would lead to overfitting. So we want to control the size of the tree.
How can we control the size of a classification tree - what levers are available
Restricting the size of the smallest terminal node.
Restricting the minimum number of observations that can fall inside a node.
Increasing the minimum criterion gain to be obtained when splitting a region.
Restricting the maximal depth of the tree
Advantages of classification trees
Trees are easily explained, easy to interpret and display graphically.
Classification trees are very flexible and do not require particular pre-processing.
Can be easily used in conjunction with ensemble methods.
Downsides of classification trees
Trees suffer high variance: A small change in the input data can result in a very different series of splits and have a large impact on the classification performance.
hierarchical nature of the splitting process, the error propagates in a multiplicative way from the root node to the leaves.
Due to their great flexibility, classification trees suffer from overfitting: classifying observations well during the fitting process, but will perform poorly classifying new unseen observations
What is bagging
Bagging is a general-purpose procedure for reducing the variance of a statistical learning method.
The main idea is that averaging multiple predictions reduces variance, thus increasing accuracy.
Why is bagging used in tree based methods of supervised learning
In general, classification trees suffer from high variance. So if we fit a classification tree to different random splits of the data we could obtain quite different results.
To increase the prediction accuracy of a learning method take many training sets from the population, fit a separate prediction model on each training set, and average the resulting predictions - bagging
When bagging how much of bagged data is usually unique
63.2% unique observations
How is predictive accuracy assessed using bagging
Out of bag error is calculated
using the fitted classifer on the bagged data for each observation fo the out of bag sample, use the fitted classifier to produce a predicted class.
The majority vote is used again to produce the final classification of the out-of-bag data points
The average out-of-bag error is an estimate of the generalisation error.