Chapter 8 - Decision Trees Flashcards

1
Q

CART - (6)

A

Classification and Regression Trees

  1. find a partition in the space of predictors
  2. predict a constant in each set of the partition
  3. partition is defined by splitting the range, one predictor at a time
  4. not all partitions are possible
  5. the prediction for a point Ri is the average of points in Ri
  6. has a lot of bias but can reduce variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is the tree built?

A

start with a single region R1 and iterate.

  1. select a region Rk , a predictor Xj and a splitting point s such that the criterion Xj < s produces the largest decrease in RSS
  2. refine the regions within this split
    - stopping rule - terminate when there are 5 observations or fewer in each partition (too far gets overfitting
    - this grows the tree from the roots to the leaves
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we control for overfitting?

A

wrong method 1: find the optimal subtree using CV (too many possibilities so overfitting)
wrong method 2: use a rule to decide when we stop growing (doesn’t work because good cuts can live below bad ones)

solution: prune a large tree from leaves to root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Tree Pruning: (2)

A

weakest link pruning: starting with To, substitute a subtree with a leaf to obtain T1 by minimizing [RSS(T1) - RSS(To) ] / [|To| - |T1|]

iterate this pruning to obtain a sequence To, T1, … to Tm where Tm is the null tree

cost complexity pruning: solve the problem
minimize{RSS} + a*|T| where a is a penalty for large number of leaves
if a is large then we select null tree Tm
if a = 0 then we select the full tree
the solution for each a is among the sequence of trees chosen for weakest link pruning
choose optimal a by CV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Incorrect CV vs Correct CV

A

incorrect CV creates the trees first, then does kfold split.

correct CV: split data kfold ways. for each training set, select the tree which minimizes RSS. select the tree Ti which minimizes the average test error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Classification Trees - types of classification loss (4 total notes)

A

predict Ri by majority vote in each region.

types of classification loss: 0-1 loss, gini index, cross entropy (similar to gini)

Gini index is the expected misclassification rate. instead of predicting the most likely class we predict from a distribution of phat_mk

it is typical to use to use Gini and cross to grow the tree, then use misclassification to prune

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Advantages of Trees

A

easy to interpret, closer to human decision making, easy to visualize, easily handle qualitative predictors and missing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we deal with Categorical variables

A

For 2 categories, the split is obvious. if there are more than 2 categories, order the categories according to the average of the response (this can be shown to be the optimal way to partition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Missing Data

A

suppose we can assign every sample to a leaf Ri despite missing data

when choosing a new split with variable Xj (growing the tree)

  • only consider the samples which have the variable Xj
  • in addition to choosing the best split, choose the second best split using a different variable, and the third best (surrogate splits)
  • to propagate down a tree, use the surrogate split for which you have information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly