Chapter 8 - Decision Trees Flashcards

Question 1

Q

CART - (6)

Answer

A

Classification and Regression Trees

find a partition in the space of predictors
predict a constant in each set of the partition
partition is defined by splitting the range, one predictor at a time
not all partitions are possible
the prediction for a point Ri is the average of points in Ri
has a lot of bias but can reduce variance

Question 2

Q

How is the tree built?

Answer

A

start with a single region R1 and iterate.

select a region Rk , a predictor Xj and a splitting point s such that the criterion Xj < s produces the largest decrease in RSS
refine the regions within this split
- stopping rule - terminate when there are 5 observations or fewer in each partition (too far gets overfitting
- this grows the tree from the roots to the leaves

Question 3

Q

How do we control for overfitting?

Answer

A

wrong method 1: find the optimal subtree using CV (too many possibilities so overfitting)
wrong method 2: use a rule to decide when we stop growing (doesn’t work because good cuts can live below bad ones)

solution: prune a large tree from leaves to root

Question 4

Q

Tree Pruning: (2)

Answer

A

weakest link pruning: starting with To, substitute a subtree with a leaf to obtain T1 by minimizing [RSS(T1) - RSS(To) ] / [|To| - |T1|]

iterate this pruning to obtain a sequence To, T1, … to Tm where Tm is the null tree

cost complexity pruning: solve the problem
minimize{RSS} + a*|T| where a is a penalty for large number of leaves
if a is large then we select null tree Tm
if a = 0 then we select the full tree
the solution for each a is among the sequence of trees chosen for weakest link pruning
choose optimal a by CV

Question 5

Q

Incorrect CV vs Correct CV

Answer

A

incorrect CV creates the trees first, then does kfold split.

correct CV: split data kfold ways. for each training set, select the tree which minimizes RSS. select the tree Ti which minimizes the average test error

Question 6

Q

Classification Trees - types of classification loss (4 total notes)

Answer

A

predict Ri by majority vote in each region.

types of classification loss: 0-1 loss, gini index, cross entropy (similar to gini)

Gini index is the expected misclassification rate. instead of predicting the most likely class we predict from a distribution of phat_mk

it is typical to use to use Gini and cross to grow the tree, then use misclassification to prune

Question 7

Q

Advantages of Trees

Answer

A

easy to interpret, closer to human decision making, easy to visualize, easily handle qualitative predictors and missing data

Question 8

Q

How do we deal with Categorical variables

Answer

A

For 2 categories, the split is obvious. if there are more than 2 categories, order the categories according to the average of the response (this can be shown to be the optimal way to partition

Question 9

Q

Missing Data

Answer

A

suppose we can assign every sample to a leaf Ri despite missing data

when choosing a new split with variable Xj (growing the tree)

only consider the samples which have the variable Xj
in addition to choosing the best split, choose the second best split using a different variable, and the third best (surrogate splits)
to propagate down a tree, use the surrogate split for which you have information

Chapter 8 - Decision Trees Flashcards

(9 cards)