Chapter 5 - Decision Trees Flashcards
Terminal Nodes
Terminal Nodes/Leaves represent the partitions of the predictor space
Internal Nodes
Internal Nodes are points along the tree where splits occur
Branches
Branches are lines that connect any 2 nodes
Stump
A decision tree with only one internal node is called a stump
Decision Tree Making Algorithm
- Construct a large tree with g terminal nodes using recursive binary splitting.
- Obtain a sequence of best subtrees, as a function of lambda, using cost complexity pruning
- Choose lambda by applying k-fold cross valiation. Select the lambda that results in the lowest cross-validation error.
- The best subtree is the subtree created in step 2 with the selected lambda value.
Recursive Binary Splitting
A top down and greedy approach to creating decision trees
Classification Error Rate (Em)
The fraction of training observations that do not belong to the most frequent category.
Em = 1- maxc(pm,c)
Gini Index (Gm)
The sum from c = 1 to w of pm,c(1-pm,c)
Cross Entropy (Dm)
The negative sum from c = 1 to w of pm,c * lnpm,c
Residual Mean Deviance
deviance / n - g
Pure Node
A node is said to be pure if all the observations in that node are of the same category
Classification Error Rate Sensitivitiy
Classification error rate is not as sensitive to node impurity as the Gini index and the cross entropy are.
Error Rate Preferences
Gini index and cross entropy are preferred over classification error rate in tree-growing. Classification error rate is generally preferred in tree-pruning if prediction accuracy is the ultamite goal.
Cost Complexity Pruning Lambda
Cost complexity pruning uses a lambda in its minimization function similar to lasso regression. The higher the value of lambda - the higer the complexity penalty and the smaller the resulting tree and also variance. When lambda = 0 the fitted tree is the largest possible. When lambda is suffuiciantly large, the fitted tree is the simplest tree only consisting of the root node and makes no splits.
Regression Trees vs Linear Models
If the relationship between the response and the predictors is well approximated by a linear model then linear models will likely outperform trees. If the above relationship is highly non-linear or more complex than a linear model can handle, then trees will likely outperform linear models.