Decision Trees Flashcards

Question 1

Q

What are decision trees?

Answer

A

Supervised models for EDA designed to stratify or segment the predictor space into several simple regions

Question 2

Q

Terminal Nodes

Answer

A

The leaves of the trees showing the observations

Question 3

Q

Internal Nodes

Answer

A

Points along the tree where the predictor space is split

Question 4

Q

Branches

Answer

A

Segments of the trees connecting the internal nodes and terminal nodes

Question 5

Q

Top-Down, Greedy Approach (Recursive Binary Splitting)

Answer

A

Start at the top of the tree and successively split the predictor space and at each step rather than looking ahead and picking a split that will lead to a better tree, we use the best split for that particular step

Question 6

Q

Why do we use the top-down greedy approach?

Answer

A

It is computationally impossible to consider every possible partition of the predictor space. So we choose the first split of the tree that will minimize the sum of squared error

Question 7

Q

What is tree pruning?

Answer

A

When we find the largest tree (T0) and then prune it back to find an optimal subtree. We select the tree with the lowest test error rate

Question 8

Q

What is cost complexity pruning?

Answer

A

Rather than considering every possible tree, we consider a sequence trees indexed by a tuning parameter

Question 9

Q

What does a high tuning parameter (a) indicate?

Answer

A

Higher punishment so the tree is forced to split higher up and thus more overfittings will happen

Question 10

Q

Which classification error is preferred if prediction accuracy of the pruned tree is the goal?

Answer

A

Classification Error Rate

Question 11

Q

What classification errors are used to evaluate quality of particular splits and why?

Answer

A

Entropy and Gini Index since they are more sensitive to node purity.

Question 12

Q

Describe some advantages of decision trees?

Answer

A

Easy to explain to people
More closely mirror human-decision making than regression and classification approaches
Can be displayed graphically, and easily interpreted even by a non-expert
Easily handle qualitative predictors without the need to create dummy variables

Question 13

Q

Describe some disadvantages of decision trees?

Answer

A

Generally, do not have the same level of predictive accuracy as some of the other regression and classification approaches
Additionally, trees can be very non-robust, so a small change can lead to a larger one in the final estimated tree

Question 14

Q

What do methods like bagging, random forest, and boosting do to trees?

Answer

A

Improve the predictive performance of the trees

Question 15

Q

What is the goal of bagging?

Answer

A

Reduce variance since decision trees tend to have high variance.

Question 16

Q

Steps of Bagging

Answer

Study These Flashcards

A

First, generate M different bootstrapped trained datasets, each of which is a random sample.
Next, we build regression trees without pruning for each of the M bootstrapped training datasets
Finally, we average all the predictions from the M regression trees (if response variable is numeric) and majority vote if categorical (mode of results)

Question 17

Q

What is random forests method?

Answer

Study These Flashcards

A

Same process as bagging except each split in each tree is only allowed to use a subset of the possible predictors which helps decorrelate the trees and reduces variance

Question 18

Q

What is a downside of bagging?

Answer

Study These Flashcards

A

If we use just bagging, then all the bagged trees will look similar to each other since they each picked the one very strong predictor in each M. So, the predictions in each of the bagged trees will be highly correlated which will not produce substantial reductions in variance.

Question 19

Q

What is boosting?

Answer

Study These Flashcards

A

Creating many trees in a sequential manner where we fit a small tree each time to the residuals from the previous tree. A shrunken version of the new tree is added to the previous tree and residuals get updated. This process is repeated many times to arrive at a final model

Question 20

Q

Describe the 3 tuning parameters of bagging?

Answer

Study These Flashcards

A

Number of trees, B. If this is large, boosting can overfit. We use cross-validation to select B
The shrinking parameter, a small positive number, which can control the rate at which boosting learns. Want a very large B to complement to achieve good performance
Number of d splits in each tree, controlling the complexity of the boosted ensemble, the interaction order of the boosted model since d splits can involve d variables

Decision Trees Flashcards

(20 cards)