Chapter 5 - Decision Trees Flashcards

Question 1

Q

Terminal Nodes

Answer

A

Terminal Nodes/Leaves represent the partitions of the predictor space

Question 2

Q

Internal Nodes

Answer

A

Internal Nodes are points along the tree where splits occur

Question 3

Q

Branches

Answer

A

Branches are lines that connect any 2 nodes

Question 4

Q

Stump

Answer

A

A decision tree with only one internal node is called a stump

Question 5

Q

Decision Tree Making Algorithm

Answer

A

Construct a large tree with g terminal nodes using recursive binary splitting.
Obtain a sequence of best subtrees, as a function of lambda, using cost complexity pruning
Choose lambda by applying k-fold cross valiation. Select the lambda that results in the lowest cross-validation error.
The best subtree is the subtree created in step 2 with the selected lambda value.

Question 6

Q

Recursive Binary Splitting

Answer

A

A top down and greedy approach to creating decision trees

Question 7

Q

Classification Error Rate (Em)

Answer

A

The fraction of training observations that do not belong to the most frequent category.
Em = 1- maxc(pm,c)

Question 8

Q

Gini Index (Gm)

Answer

A

The sum from c = 1 to w of pm,c(1-pm,c)

Question 9

Q

Cross Entropy (Dm)

Answer

A

The negative sum from c = 1 to w of pm,c * lnpm,c

Question 10

Q

Residual Mean Deviance

Answer

A

deviance / n - g

Question 11

Q

Pure Node

Answer

A

A node is said to be pure if all the observations in that node are of the same category

Question 12

Q

Classification Error Rate Sensitivitiy

Answer

A

Classification error rate is not as sensitive to node impurity as the Gini index and the cross entropy are.

Question 13

Q

Error Rate Preferences

Answer

A

Gini index and cross entropy are preferred over classification error rate in tree-growing. Classification error rate is generally preferred in tree-pruning if prediction accuracy is the ultamite goal.

Question 14

Q

Cost Complexity Pruning Lambda

Answer

A

Cost complexity pruning uses a lambda in its minimization function similar to lasso regression. The higher the value of lambda - the higer the complexity penalty and the smaller the resulting tree and also variance. When lambda = 0 the fitted tree is the largest possible. When lambda is suffuiciantly large, the fitted tree is the simplest tree only consisting of the root node and makes no splits.

Question 15

Q

Regression Trees vs Linear Models

Answer

A

If the relationship between the response and the predictors is well approximated by a linear model then linear models will likely outperform trees. If the above relationship is highly non-linear or more complex than a linear model can handle, then trees will likely outperform linear models.

Question 16

Q

Advantages of Trees

Answer

Study These Flashcards

A

Easy to interpret and explain. Can be presented visually. Manage categorical variables without the need of dummy variables. Mimic human decision making.

Question 17

Q

Disadvantages of Trees

Answer

Study These Flashcards

A

Not robust - A small change in the data can cause a large change in the final tree. Do not have the same degree of predictive accuracy as other statistical methods.

Question 18

Q

Bagging Algorithm

Answer

Study These Flashcards

A

Create b bootstrap samples from the original training dataset.
Construct a decision tree for each bootstrap sample using recursive binary splitting
Predict the response of a new observation by averaging the predictions (regression trees) or by using the most frequent category (classification trees) across all b trees.

Question 19

Q

Bagging Properties

Answer

Study These Flashcards

A

General purpose procedures that works for many statistical methods - unusual for time series data. Bagging reduces varaince, but it is difficult to intrepret the whole bagged model. Increasing b does not cause overfitting. Each bagged tree uses about 2/3 of the original observations.

Question 20

Q

Out of Bag Observations (OOB)

Answer

Study These Flashcards

A

The observations that are not used to train a particular bagged tree are called out-of-bag observations. OOB error is a valid estaimate of test error.

Question 21

Q

Random Forests Algoritm

Answer

Study These Flashcards

A

Create b bootstrap samples from the orignial training dataset.
Construct a decision tree for each bootstrap sample using recursive binary splitting. At each split, a random subset of k variables are considered.
Predict the response of a new observation by averaging the predictions (regression trees) or by using the most frequent category (classification trees) across all b trees.

Question 22

Q

Random Forests Properties

Answer

Study These Flashcards

A

Bagging is a special case of random forests where k = p. Random forests reduce variance by averaging the results of decorrelated trees. Increasing b does not cause overfitting. Decreaing k reduces the correlation between predictions.

Question 23

Q

Boosting Algorithm

Answer

Study These Flashcards

A

Let z1 be the actual response variable y.
1. For k = 1, 2, … , b:
Use recursive binary splitting to fir a tree with d splits to the data with zk as the response.
Update zk by subtracting lambda * fk(X)
2. Calculate the boosted model prediction as f(X) = sum from k=1 to b of lambda * fk(x)

Question 24

Q

Boosting Properties

Answer

Study These Flashcards

A

It grows trees sequentially by using information from previous trees. Increasing b can cause overfitting, and b is usually selected by cross-validation. Boosting reduces bias. d, the interation depth, controls the complexity of the boosted model - a tree with d splits has d+1 terminal nodes and if d = 1 the model is an additive model. lambda, the shrinkage parameter controls the rate at which boosting learns - Boosting is an example of slow learning and typically lambda is between 0 and 1. A small lambda may require a large b to achieve a good model

Chapter 5 - Decision Trees Flashcards

(24 cards)