Wronged Questions: Decision Trees Flashcards
T/F: Small shrinkage parameter requires more iterations because it has a slower learning rate
True
T/F: Boosting can lead to overfitting if you have a high number of iterations
True
As K increases, flexibility (increases/decreases).
Decreases
Classification rate, gini index, and entropy are (inappropriate/appropriate) when pruning a tree.
Appropriate
T/F: Decision trees are easier to interpret than linear models.
True
T/F: Decision trees are more robust than linear models.
False. Decision trees are generally less robust than linear models; they can produce significantly different outcomes with small changes in the input data.
T/F: Decision trees handle qualitative predictors more easily than linear models.
True. Decision trees naturally handle qualitative (categorical) predictors without the need for preprocessing steps such as creating dummy variables, which are often required in linear models.
T/F: In boosting, the number of terminal nodes in each tree is independent of the number of splits.
False. In boosting, the number of terminal nodes in each tree is directly related to the number of splits. The number of terminal nodes (leaves) in a tree is one more than the number of splits.
T/F: Boosting does not allow for the adjustment of model complexity through the parameter.
False. Parameter d is specifically used to adjust the complexity of the model in boosting.
T/F: Boosting considers only a random subset of predictors at each node in every tree.
False. In classical boosting algorithms, all available predictors are considered at each split, not a random subset.
T/F: A smaller value of d in boosting necessitates a larger number of trees to adequately model the data.
True. A smaller implies simpler base learners (trees), which individually capture less of the data’s complexity. Therefore, more trees are needed to aggregate enough information to model the data effectively.
T/F: Like bagging, boosting is a general approach that can be applied to many statistical learning methods for regression or classification.
True
T/F: Each tree is fit on a modified version of the bootstrapped samples for boosting.
False. Boosting does not involve bootstrap sampling; instead each tree is fit on a modified version of the original data set.
T/F: Unlike fitting a single large decision tree to the data, which amounts to fitting the data hard and potentially overfitting, the boosting approach instead learns slowly.
True
T/F: In boosting, unlike in bagging, the construction of each tree depends strongly on the trees that have already been grown.
True
T/F: Like bagging, boosting involves combining a large number of decision trees.
True
T/F: Unlike bagging and random forests, boosting can overfit if B is too large.
True
T/F: Cross-validation is used to select B.
True
T/F: A very small value of the shrinkage parameter can require using a very large number of trees to achieve good performance.
True
T/F: An interaction depth of zero often works well and the boosted ensemble is fitting an additive model.
False. An interaction depth of one often works well and the boosted ensemble is fitting an additive model.
T/F: In boosting, because the growth of a particular tree takes into account the other trees that have already been grown, smaller trees are typically sufficient.
True
T/F: Individual trees in a random forest are left unpruned, contributing to the ensemble’s variance reduction despite their own overfitting.
True. In a random forest, individual trees are typically grown to their full depth without pruning, which might make them prone to overfitting. However, when these overfitted trees are aggregated, the ensemble model achieves a significant reduction in variance.
T/F: The combination of results from unpruned trees in a random forest leads to a reduction in the overall variance of the model.
True. Emphasizing the ensemble effect where the aggregation of multiple unpruned, overfitted trees results in a model with reduced overall variance, leveraging the strength of the ensemble to balance out individual tree overfitting.
T/F: Increasing m leads to a higher degree of decorrelation between the trees, where m is the number of predictors chosen as split candidates at each split.
False. As a larger value of m tends to increase the correlation between trees. The parameter m is the number of predictors chosen as split candidates at each split.
T/F: Random forests can effectively handle both regression and classification problems.
True. It acknowledges the versatility of random forests in addressing different types of prediction problems.
T/F: A variable importance plot is a useful tool for identifying which predictors are most influential in a random forest model.
True. Variable importance plots are indeed utilized to identify the most important predictors in a random forest, offering insights into how different variables contribute to the predictive power of the model.
T/F: Bagging and random forests make use of bootstrapped samples in their algorithms, but boosting does not.
True. Bagging and random forests employ bootstrapped samples to build multiple decision trees, enhancing model accuracy and robustness by aggregating their predictions.
Conversely, boosting sequentially constructs trees, each focusing on correcting errors from previous ones, without utilizing bootstrapped samples to improve performance.
T/F: Boosting can overfit if the number of iterations is set too high, unlike bagging or random forests.
True. Boosting’s performance can be sensitive to the number of iterations, leading to potential overfitting.
T/F: The optimal number of iterations in boosting is often determined through cross-validation.
True. Cross-validation is commonly used to select the optimal B in boosting to balance bias and variance.
T/F: For bagging and random forests, the choice of B is less critical to avoiding overfitting compared to boosting.
True. Bagging and random forests are generally robust against overfitting due to their aggregation methods, making the specific choice of B less critical.
T/F: Pruning is a common strategy in both bagging and boosting to prevent overfitting.
False. Pruning is not used in either bagging or boosting as a strategy to prevent overfitting.
T/F: Bagging significantly reduces variance by averaging multiple predictions.
True. One of the primary advantages of bagging is its ability to reduce the variance of complex models, like deep/large decision trees, by averaging the predictions of multiple bootstrapped models, which tends to make the ensemble prediction more robust than any single model.
T/F: Each bagged tree uses approximately one-third of observations from the original training set.
False. In bagging, each tree, on average, makes use of around two-thirds of the observations due to the nature of bootstrap sampling, where some observations are repeated, and others are left out.
T/F: Bagging is exclusively effective for decision trees and cannot be applied to other statistical learning methods.
False. Bagging is a general-purpose procedure that can be applied to many types of statistical learning methods, not just decision trees, although it is particularly beneficial for models that exhibit high variance.
T/F: On average, (p-m)/p of the splits will not even consider the strong predictor.
True
T/F: The main difference between bagging and random forests is the choice of predictor subset size.
True
T/F: If a random forest is built using m = p^1/2, then this amounts to bagging.
False. If a random forest is built using m = p, then this amounts to bagging.
T/F: Using a small value of m in building a random forest will typically be helpful when we have a large number of correlated predictors.
True
T/F: Random forests will not overfit if we increase B, so in practice we use a value of B sufficiently large for the error rate to have settled down.
True
T/F: An alpha value of zero results in the largest, unpruned tree.
True. An a value of zero implies no penalty on the tree’s complexity, and thus the tree grows to its largest size without any pruning. This results in the most complex tree possible.
T/F: Increasing alpha leads to a decrease in the variance of the model.
True. As the tree becomes simpler with higher a
values, its variance decreases due to less model flexibility.
T/F: Increasing alpha leads to a decrease in the squared bias of the fitted tree.
False. Increasing alpha leads to an increase in the squared bias of the fitted tree. This is because increasing the value of a in cost complexity pruning penalizes the addition of splits to the tree, resulting in a simpler tree (lower variance).
A simpler tree is less flexible in fitting the data, which leads to an increase in the squared bias as the model becomes increasingly unable to capture the underlying patterns in the data.
Three non-parametric statistical learning methods
KNN, Decision Trees, Bagging/Random Forest/Boosting
T/F: To build each tree using random forests, a bootstrapped sample of n observations is used, and for each split within the tree, a new random selection of m predictors is made.
True. In random forests, each tree is built from a bootstrapped sample of the original dataset, containing n observations. At each split in the construction of a tree within a random forest, a random subset of m predictors is selected from all available predictors.
T/F: Out-of-bag estimation can be used to estimate the test error for random forests.
True. Out-of-bag estimation, which uses each observation’s predictions from trees where that observation was not in the bootstrap sample, provides an estimate of the test error without needing a separate test set.
T/F: Random forests reduce bias through the averaging of multiple decorrelated trees.
False. It is incorrect because the main benefit of random forests is variance reduction, not necessarily bias reduction. Random forests reduce variance by averaging the results of multiple decorrelated trees. While this ensemble method is effective at addressing overfitting and reducing variance, the reduction in bias is not guaranteed.