Trees and ensembles Flashcards
What are the differences between classification trees and regression trees in terms of output and node splitting criteria?
Classification trees predict categorical outcomes, using measures like the Gini Index or Cross-Entropy for node splitting, while regression trees predict continuous values and use sum of squared errors as the splitting criterion.
How is the Gini Index used in classification trees, and why is it preferred over misclassification error for node splitting?
The Gini Index measures impurity by quantifying how mixed the classes are within a node. It is preferred over misclassification error because it is more sensitive to changes in node purity, providing better splits during tree growth.
Explain the process of node splitting in regression trees. How is the optimal split chosen?
Node splitting in regression trees is based on minimizing the sum of squared errors across the child nodes. The best split is the one that results in the largest reduction in variance or the lowest prediction error for the resulting nodes.
What is CART, and how does it help in preventing overfitting in decision trees?
CART (Classification and Regression Trees) uses techniques like pruning and cross-validation to prevent overfitting. It prunes back a fully grown tree based on a cost-complexity measure that balances the tree’s complexity with its predictive power.
How does bootstrap aggregating (bagging) reduce variance in decision tree models, and what are its limitations?
Bagging reduces variance by averaging the predictions of multiple decision trees grown on different bootstrap samples of the data. However, it has limitations, such as trees being correlated when they share similar features, which can limit the variance reduction.
What is Random Forest, and how does it improve upon bagging?
Random Forest improves upon bagging by introducing feature sampling. In addition to using bootstrap samples, it selects a random subset of features at each node to reduce the correlation between trees, leading to better generalization.
Explain the concept of feature sampling in Random Forests. How does it help in decorrelating trees?
In Random Forest, a random subset of features is chosen at each node for splitting, which helps reduce the correlation between trees. This decorrelation leads to lower variance in the ensemble’s predictions.
How does AdaBoost work, and how does it differ from bagging in terms of weighting samples and classifiers?
AdaBoost assigns higher weights to misclassified samples and updates classifier weights to focus more on difficult samples in subsequent iterations. This contrasts with bagging, where all samples and classifiers are equally weighted.
Explain the mathematical formulation of Gradient Boosting. How does it sequentially improve the model?
Gradient Boosting minimizes a loss function by sequentially adding weak learners (e.g., shallow trees) that correct the residuals of previous models. Each model focuses on the errors made by the previous models.
Why is overfitting less of a concern in Random Forests compared to single decision trees?
Random Forest reduces overfitting because it averages predictions over many uncorrelated trees, which helps reduce the model’s variance. Overfitting is less of a concern due to the randomness introduced in both sampling and feature selection.
How does pruning help in decision tree models, and what is the cost-complexity criterion used for pruning?
Pruning involves removing sections of the tree that provide little predictive power to avoid overfitting. The cost-complexity criterion used for pruning balances the accuracy of the model with its simplicity by minimizing both the training error and the size of the tree.
Describe how the splitting criterion in classification trees is based on impurity measures such as Gini Index and Cross-Entropy.
The Gini Index measures how pure a node is by assessing the likelihood of a randomly chosen sample being misclassified. Cross-Entropy measures the difference between the true distribution of the classes and the predicted distribution, both aiming to find pure nodes.
What is the role of ‘out-of-bag’ (OOB) samples in Random Forests, and how are they used for model evaluation?
‘Out-of-bag’ (OOB) samples are the observations not included in a bootstrap sample used to train a particular tree in Random Forest. They are used to estimate the model’s error and tune hyperparameters without the need for a separate validation set.
How does the concept of ‘weak learners’ in Boosting contribute to the overall model performance?
In Boosting, weak learners are simple models that perform slightly better than random guessing. Combining these weak learners sequentially improves the overall model performance, as each learner corrects the mistakes of the previous ones.
What is the difference between AdaBoost and Gradient Boosting in terms of updating weights and minimizing loss functions?
AdaBoost updates the weights of misclassified samples to emphasize difficult observations, while Gradient Boosting minimizes a differentiable loss function (e.g., squared error) by fitting models sequentially on the residuals of the previous model.