SRM Chapter 5 - Decision Trees Flashcards
5.1 Regression Trees 5.2 Classification Trees 5.3 Multiple Trees
In a decision tree what does the first (top-most) split indicate?
The most important predictor (most influential on the response variable)
Terminal nodes aka tree leaves
- Regions that the tree is split into
- Think endpoints for each branch of decisions
- They end in the predicted values for each branch of decisions
Internal nodes
Intermediate splits along the tree
Stump
- A tree with only one internal node
- Picture no branches at all
Child nodes
- Nodes resulting from a split
- The node above two child nodes is the parent node
Branches
The lines connecting any two nodes
How is a tree partitioned?
Predictor space is partitioned into g regions such that the SSE is minimized.
Recursive binary splitting
Top-down: start with everything in one region and select the next best split each time, working down the tree.
Binary: because two regions are created at each split.
Why is recursive binary splitting considered “greedy”?
- Because it’s a top-down approach
- So, it only considers the next-best decision starting from the top, rather than holistically looking at all branch/split possibilities
Stopping criteria
Criteria for when to stop splitting the predictor space.
e.g. No region allowed to have fewer than some number of observations.
When the predictor space is split and observations are divided, how is the predicted value chosen?
By computing the sample mean of each region (y-bar Rm).
Pruning
Reducing the number of leaves (terminal nodes) in a tree to reduce flexibility, variance and the risk of overfitting (recursive binary splitting is prone to overfitting because it can result in a large and complex tree).
We can do this by creating a sub tree (eliminating an internal node in the tree)
Cost complexity pruning
- Aka weakest link pruning
- Uses a tuning parameter to find a sequence of subtrees, tells us which subtree to consider for each value of the tuning parameter
- Similar to lasso/ridge regression in that the tuning parameter is a penalty term to reduce the flexibility
What does cost complexity pruning result in?
Nested sequences of subtrees
How does cost complexity pruning work?
- Has a tuning parameter (lambda) that acts a penalty term to punish excess flexibility in the model
- Also needs a |T| to offset lambda
- These are inversely related
What does the graph of # of terminal nodes against lambda look like?
- Step function
- Looks like stairs coming down from the left
- i.e. as the tuning parameter increases, the number of leaves decreases
How do we select the best subtree after pruning the tree?
Cross-validation
Choose lambda by applying k-fold cross-validation, and pick the lambda that results in the lowest CV error.
Then the best subtree is the one that has the selected lambda value.
What does |T| denote?
The size of the tree
What is the primary objective of tree pruning?
To find a subtree that minimizes the TEST error rate.
Effect of tree pruning on:
Variance
Flexibility
Bias
Residual sum of squares (SSE)
Number of terminal nodes |T|
Tuning parameter (lambda)
Penalty term (lambda * |T|)
Variance: DOWN
Flexibility: DOWN
Bias: UP
SSE: UP
|T|: DOWN
Lambda: UP
Penalty term: DOWN (with |T|)
Examples of greedy models
Think one-step-at-a-time, without holistic consideration.
- Recursive binary splitting
- Forward stepwise selection
- Backward stepwise selection
For a smaller subtree, what happens?
RSS + # terminal nodes * tuning parameter is minimized for a smaller subtree.
aka RSS + |T|* lambda
Decision Tree Yes/No
Yes = left branch
No = right branch
Impurity function (classification)
Used in classification decision trees to decide the best split (instead of splitting to minimize the error sum of squares)