Decision Trees Flashcards
True or false: decision trees are a non-parametric alternative to regression
True
How do decision trees work?
They split predictors into regions and then assign the average value of the region in the regression setting and the most common value in the classification setting.
What algorithm is used to grow the trees and how does it work?
Recursive Binary Splitting
It selects a binary split that minimizes the MSE. The algorithm is greedy - it only optimizes the current split. The algorithm continues until the number of observations in a region is below a specified number.
Why do we prune the tree?
The resulting tree from recursibe binary splitting is probably too big: more splits means more flexibility, lower biais and higher variance.
True or false: there is no optimal number of splits that minimizes MSE?
True
What are the two pruning methods?
- Cost complexity pruning
2. Weakest link pruning
What is the tuning parameter (alpha)?
The cost of a tree per terminal nods.
Wow is the tuning parameter selected?
Cross-validation
In a classification tree, what measure is used instead of the MSE as a number to minimize?
The classification error rate
For tree growing, why can’t we use the classification error rate and what can we use instead?
Not sensitive enough
Gini index or Cross-entropy
True or false: the Gini index is the variance of observations?
True
In a classification tree, what measures can be used for:
- Pruning the tree
- Splitting the tree
- Classification error rate
2. Gini index or cross-entropy
What are the advantages of decision trees over linear regression?
- Easier to explain
- Closer to the way human decisions are made
- Tree can be graphed, making it easier to interpret
- Easier to handle categorical predictors (linear regression requires dummy variables)
What are the decision tree’s shortcomings?
- Do not predict as well as linear regression
2. Not robust (small change in the input data can have a big effect on trees)
What methods can be used to adress the decision tree’s shortcomings?
Bagging, random forest and boosting