Decision Trees Flashcards
What are internal nodes?
Points along the tree where splits occur.
What are terminal nodes/leaves?
Represent the partitions of the predictor space.
What makes a decision tree a “stump”?
It only has one internal node (i.e. one split).
Name four advantages to trees:
- Easy to interpret and explain.
- Can be presented visually.
- Manage categorical variables without the need for dummy variables.
- Mimic human decision-making.
Name two disadvantages to trees:
- Not robust (i.e. you are not going to get the same results every time you go to create a tree from the same data)
- Not as accurate as other statistical methods, such as supervised learning methods.
What makes recursive binary splitting a top-down and greedy approach?
It is top-down because we are starting from all observations and split from there. It is greedy because it only considers the best split at that point in time and does not consider future splits.
What is a drawback to recursive binary splitting and how can we resolve this issue?
Recursive binary splitting can create a tree that is too leafy/complex, i.e. has high variance. We can cut down on variance by pruning the leaves of the tree via cost-complexity pruning, which is tweaked by the tuning parameter, alpha.
What is the algorithm to selecting the best subtree based on alpha in cost-complexity pruning?
- Construct a large tree with g terminal nodes using recursive binary splitting.
- Obtain a sequence of best subtrees, as a function of alpha, using cost-complexity pruning.
- Choose alpha by applying k-fold cross validation. Select the alpha that results in the lowest cross validation error.
- The best subtree is the subtree created in step 2 with the selected alpha value from step 3.
*Note that if alpha=0 is the original large tree with g terminal nodes.
What is bootstrapping?
It is random sampling WITH replacement, so an observation can appear in a bootstrap sample more than once.
What is node purity? What can we say about the node depending on its value?
Pure nodes are those that have one class (i.e. all observations all belong in one class). Value increases as node becomes more impure (i.e. observations do not all belong in the same class).
Does the number of clusters in a dendrogram increase or decrease as you go up?
Decreases…At the bottom of the dendrogram, each observation is its own cluster. As you go up and observations start to cluster together, the number of clusters decreases.
T/F: For a given number of clusters, hierarchical clustering can sometimes yield less accurate results than K-means clustering.
True. K-means does a fresh analysis for each value of K while for hierarchical clustering, reduction in the number of clusters is tied to clusters already made. This can miss cases where the clusters are not nested.
If p=total number of features and m=number of features selected at each split in a random forest, what is the typical choice of m?
sqrt(p) or p/3.
For a random forest, let p
be the total number of features and m
be the number of features selected at each split. What is the probability a split will not consider the strongest predictor?
(p-m)/p
How do bagging/random forest differ from boosting in terms of variance/bias tradeoff?
Bagging reduces variance, whereas boosting reduces bias (i.e. increases variance).