Decision Trees Flashcards
True or false: decision trees are a non-parametric alternative to regression
True
How do decision trees work?
They split predictors into regions and then assign the average value of the region in the regression setting and the most common value in the classification setting.
What algorithm is used to grow the trees and how does it work?
Recursive Binary Splitting
It selects a binary split that minimizes the MSE. The algorithm is greedy - it only optimizes the current split. The algorithm continues until the number of observations in a region is below a specified number.
Why do we prune the tree?
The resulting tree from recursibe binary splitting is probably too big: more splits means more flexibility, lower biais and higher variance.
True or false: there is no optimal number of splits that minimizes MSE?
True
What are the two pruning methods?
- Cost complexity pruning
2. Weakest link pruning
What is the tuning parameter (alpha)?
The cost of a tree per terminal nods.
Wow is the tuning parameter selected?
Cross-validation
In a classification tree, what measure is used instead of the MSE as a number to minimize?
The classification error rate
For tree growing, why can’t we use the classification error rate and what can we use instead?
Not sensitive enough
Gini index or Cross-entropy
True or false: the Gini index is the variance of observations?
True
In a classification tree, what measures can be used for:
- Pruning the tree
- Splitting the tree
- Classification error rate
2. Gini index or cross-entropy
What are the advantages of decision trees over linear regression?
- Easier to explain
- Closer to the way human decisions are made
- Tree can be graphed, making it easier to interpret
- Easier to handle categorical predictors (linear regression requires dummy variables)
What are the decision tree’s shortcomings?
- Do not predict as well as linear regression
2. Not robust (small change in the input data can have a big effect on trees)
What methods can be used to adress the decision tree’s shortcomings?
Bagging, random forest and boosting
What is the effect of bagging, random forest and boosting on the variance of decision trees?
They lower the variance
Explain the bagging method.
Bagging is a form of bootstraping.
- Select B bootstrap samples from the n observations.
- Construct B trees
- Average the results of each trees
What is a bootstrap sample?
Simulating a bootstrap sample of size n means drawing n items from the initial sample with replacement.
What is the effect of bagging on a simple tree’s variance?
Divides it by B (# of bootstrap saples)
Is there a danger of over fitting by making B too large in bagging?
No.
What is out-of-bag (OOB) validation?
For n sufficently large, about 1/3 of the initial observations won’t be used in the bootstrap samples. For each tree, the test MSE can be computed using the OOB part of the sample. This eliminates the need for cross-validation.
Explain the random forest method
- Specify a positive integer ‘m’
2. At each split, m predictors are selected randomly and those are the only predictors that are considered for splitting
Is there a danger of over fitting by making B too large in random forest?
No.
Why would we use random forest over bagging?
Bagged trees may be correlated. Selectng m predictors has the effect of decorrelating the trees.
True or false : if m = k, random forest is reduced to bagging?
True.
Is there a danger of over fitting by making B too large in boosting?
Yes.