7. Tree-based algorithms Flashcards
Purpose of tree-based algorithms
Prediction
( applicable to regression and classification )
Impurity measures
Used to determine the quality of the split
- Gini Index
- Entropy
- Re-substitution error
Explain decision tree alg
(0. Pre-processing e.g. binarization, else use regression tree instead of classification tree)
1. Recursive binary splitting and determine splits via impurity measures
2. Improve with Cost complexity pruning –> Grow large tree and prune it back.
( Select tuning parameter using (k-fold) cross-validation )
Decision Tree advantages
- Low pre-processing effort (normalization, scaling not required)
- Missing values have little effect
- Easy to explain and interpret (closely mimic human decision-making)
- Handle qualitative predictors without need to create dummy variables ( qualitative, quantitative, continuous, and discrete variables )
- Faster than RF
Decision Tree disadvantages
- Lower predictive accuracy (vs. Regression / Classification)
–> improve by aggregation, at loss of interpretability and speed - Overfitting risk (unlike Bagging/RF)
- Instability for changes in data
- Can become quite Complex (expensive)
Effect of Bagging, Boosting, RF
+ Increase predictive accuracy (lower variance)
- Lower interpretability
- Lower speed (complexity)
RF: Adds random selection to bagging, to produce uncorrelated trees. Less risk of overfitting
Explaining Bagging
- Many weak learners (bootstrap samples) are trained.
- Take the mean of these estimation over the collection of bootstrap samples
–> The overall prediction is the most commonly occurring class among the predictions (== majority vote)
Explain Boosting
System of ensemble learners, using a gradient boosting function to iteratively train models that use data values that have been modelled poorly in previous iterations.
( Bags chosen at random with replacement = bootstrap samples)
+ Improve performance and reduce variance
Boosting vs. RF vs. Bagging
Boosting:
— Selects from all predictive variables
— Sequentially depends on error rate of previous iteration
— 3 Tuning Param
Benefit: Learns with previous error term
RF:
— Selects from subset of predictive variables
— Built independently at each iteration
— 2 Tuning Param
Bagging:
— Aggregation (by mean) of bootstrap samples