Decision Trees Flashcards

1
Q

What are internal nodes?

A

Points along the tree where splits occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are terminal nodes/leaves?

A

Represent the partitions of the predictor space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What makes a decision tree a “stump”?

A

It only has one internal node (i.e. one split).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name four advantages to trees:

A
  1. Easy to interpret and explain.
  2. Can be presented visually.
  3. Manage categorical variables without the need for dummy variables.
  4. Mimic human decision-making.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Name two disadvantages to trees:

A
  1. Not robust (i.e. you are not going to get the same results every time you go to create a tree from the same data)
  2. Not as accurate as other statistical methods, such as supervised learning methods.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What makes recursive binary splitting a top-down and greedy approach?

A

It is top-down because we are starting from all observations and split from there. It is greedy because it only considers the best split at that point in time and does not consider future splits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a drawback to recursive binary splitting and how can we resolve this issue?

A

Recursive binary splitting can create a tree that is too leafy/complex, i.e. has high variance. We can cut down on variance by pruning the leaves of the tree via cost-complexity pruning, which is tweaked by the tuning parameter, alpha.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the algorithm to selecting the best subtree based on alpha in cost-complexity pruning?

A
  1. Construct a large tree with g terminal nodes using recursive binary splitting.
  2. Obtain a sequence of best subtrees, as a function of alpha, using cost-complexity pruning.
  3. Choose alpha by applying k-fold cross validation. Select the alpha that results in the lowest cross validation error.
  4. The best subtree is the subtree created in step 2 with the selected alpha value from step 3.
    *Note that if alpha=0 is the original large tree with g terminal nodes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is bootstrapping?

A

It is random sampling WITH replacement, so an observation can appear in a bootstrap sample more than once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is node purity? What can we say about the node depending on its value?

A

Pure nodes are those that have one class (i.e. all observations all belong in one class). Value increases as node becomes more impure (i.e. observations do not all belong in the same class).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Does the number of clusters in a dendrogram increase or decrease as you go up?

A

Decreases…At the bottom of the dendrogram, each observation is its own cluster. As you go up and observations start to cluster together, the number of clusters decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

T/F: For a given number of clusters, hierarchical clustering can sometimes yield less accurate results than K-means clustering.

A

True. K-means does a fresh analysis for each value of K while for hierarchical clustering, reduction in the number of clusters is tied to clusters already made. This can miss cases where the clusters are not nested.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If p=total number of features and m=number of features selected at each split in a random forest, what is the typical choice of m?

A

sqrt(p) or p/3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For a random forest, let p
be the total number of features and m
be the number of features selected at each split. What is the probability a split will not consider the strongest predictor?

A

(p-m)/p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do bagging/random forest differ from boosting in terms of variance/bias tradeoff?

A

Bagging reduces variance, whereas boosting reduces bias (i.e. increases variance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly