Wronged Questions: Decision Trees Flashcards

Question

T/F: Random forests can effectively handle both regression and classification problems.

Answer 1

True. It acknowledges the versatility of random forests in addressing different types of prediction problems.

Answer 2

True. Variable importance plots are indeed utilized to identify the most important predictors in a random forest, offering insights into how different variables contribute to the predictive power of the model.

Answer 3

True. Bagging and random forests employ bootstrapped samples to build multiple decision trees, enhancing model accuracy and robustness by aggregating their predictions. Conversely, boosting sequentially constructs trees, each focusing on correcting errors from previous ones, without utilizing bootstrapped samples to improve performance.

Answer 4

True. Boosting's performance can be sensitive to the number of iterations, leading to potential overfitting.

Answer 5

True. Cross-validation is commonly used to select the optimal B in boosting to balance bias and variance.

Answer 6

True. Bagging and random forests are generally robust against overfitting due to their aggregation methods, making the specific choice of B less critical.

Answer 7

False. Pruning is not used in either bagging or boosting as a strategy to prevent overfitting.

Answer 8

True. One of the primary advantages of bagging is its ability to reduce the variance of complex models, like deep/large decision trees, by averaging the predictions of multiple bootstrapped models, which tends to make the ensemble prediction more robust than any single model.

Answer 9

False. In bagging, each tree, on average, makes use of around two-thirds of the observations due to the nature of bootstrap sampling, where some observations are repeated, and others are left out.

Answer 10

False. Bagging is a general-purpose procedure that can be applied to many types of statistical learning methods, not just decision trees, although it is particularly beneficial for models that exhibit high variance.

Answer 11

False. If a random forest is built using m = p, then this amounts to bagging.

Answer 12

True. An a value of zero implies no penalty on the tree's complexity, and thus the tree grows to its largest size without any pruning. This results in the most complex tree possible.

Answer 13

True. As the tree becomes simpler with higher a values, its variance decreases due to less model flexibility.

Answer 14

False. Increasing alpha leads to an increase in the squared bias of the fitted tree. This is because increasing the value of a in cost complexity pruning penalizes the addition of splits to the tree, resulting in a simpler tree (lower variance). A simpler tree is less flexible in fitting the data, which leads to an increase in the squared bias as the model becomes increasingly unable to capture the underlying patterns in the data.

Answer 15

KNN, Decision Trees, Bagging/Random Forest/Boosting

Answer 16

True. In random forests, each tree is built from a bootstrapped sample of the original dataset, containing n observations. At each split in the construction of a tree within a random forest, a random subset of m predictors is selected from all available predictors.

Answer 17

True. Out-of-bag estimation, which uses each observation's predictions from trees where that observation was not in the bootstrap sample, provides an estimate of the test error without needing a separate test set.

Answer 18

False. It is incorrect because the main benefit of random forests is variance reduction, not necessarily bias reduction. Random forests reduce variance by averaging the results of multiple decorrelated trees. While this ensemble method is effective at addressing overfitting and reducing variance, the reduction in bias is not guaranteed.

Answer 19

False. Pruning does not always decrease both error rates.

Answer 20

False. Pruning typically increases the training error rate due to less perfect fitting to the training data, the impact on the test error rate can be mixed.

Answer 21

False. There is usually not a predictable effect on the test error rate.

Answer 22

True. Pruning a classification tree involves removing splits that provide less generalizable decision-making ability, leading to a simpler model. While this process makes the tree less flexible and typically increases the training error rate due to less perfect fitting to the training data, the impact on the test error rate can be mixed.

Answer 23

False. Pruning a classification tree increases the training error rate. We still do it to prevent overfitting.

Answer 24

False. Bagging does not make the model more interpretable; in fact, it makes it less interpretable.

Answer 25

False. OOB error estimation uses only the trees for which the specific observation was not in the bootstrap sample.

Answer 26

False. Increasing the number of trees does not lead to overfitting due to the aggregation of predictions.

Answer 27

False. Bagging is indeed useful for improving prediction accuracy in classification settings.

Answer 28

True. For a sufficiently large number of trees, out-of-bag error is virtually equivalent to leave-one-out cross-validation error. OOB error estimation, which uses each tree's predictions on observations not included in its bootstrap sample, provides a reliable estimate of the model's performance that becomes increasingly accurate as the number of bootstrap samples increases.

Answer 29

False. Each bootstrapped dataset likely contains repeated observations due to sampling with replacement, and not all observations will be different.

Answer 30

False. utilize k-fold cross-validation to select the tuning parameter α by evaluating the mean squared prediction error on the data in the left-out kth fold, as a function of α.

Answer 31

Recursive binary splitting, forward stepwise selection, backward stepwise selection

Answer 32

True. Random forest models average the result of each individual tree to obtain a single prediction for each observation. The action of averaging reduces the variance.

Answer 33

True. Random forests consider a random subset of predictors at each split to decorrelate trees.

Answer 34

False. It's sufficiently sensitive for pruning but not tree growing :(

Answer 35

False. The Gini index takes on a small value if all of the Pmk's are all near zero or near one. For this reason the Gini index is referred to as a measure of node purity—a small value indicates that a node contains predominantly observations from a single class.

Answer 36

False. The entropy takes on a small value if all of the Pmk's are all near zero or near one. For this reason the Gini index is referred to as a measure of node purity—a small value indicates that a node contains predominantly observations from a single class.

Answer 37

False. Gini index and the entropy are quite similar numerically.

Answer 38

False. Bagging is a procedure used to reduce variance of a statistical learning method.

Answer 39

False. Pruning is not typically used in bagging, random forests, or boosting as part of their standard methodologies.

Answer 40

False. The complexity control is achieved through parameters like the number of trees (B), the learning rate, and the depth of the trees (d) rather than pruning.

Answer 41

False. In bagging, the bagged trees are grown deep and not pruned. The idea is to produce trees that each has a high variance but a low bias. Averaging the trees reduces the variance.

Answer 42

True. In bagging, because each bagged tree is trained independently, the tree may converge to one of the suboptimal solutions (local optima) that are not the best overall solution for the entire ensemble.

Answer 43

True. It is also true for bagging. After a sufficient number of trees, the test error will settle down and flatline.

Answer 44

True. This is because, in boosting, the growth of a particular tree takes into account the other trees that have been grown, unlike in random forests where the trees are grown independently of each other.

Answer 45

True. Weakest link pruning is a method used to select a limited number of subtrees from a larger set of possibilities. This approach focuses on identifying and pruning the weakest links or least important branches in the decision tree, resulting in a simpler and more interpretable model.

Answer 46

True. Instead of exhaustively considering every potential subtree, weakest link pruning involves evaluating a sequence of trees indexed by a nonnegative tuning parameter α. This parameter controls the complexity of the tree, allowing for the exploration of different tree sizes and structures.

Answer 47

False. As the tuning parameter increases, there is a price to pay for having a tree with many terminal nodes, and so the RSS plus the number of terminal nodes times the tuning parameter α will tend to be minimized for a smaller subtree.

Answer 48

False. In boosting, because the growth of a particular tree takes into account the other trees that have already been grown, smaller trees are typically suﬀicient.

Answer 49

True. Often d=1 works well, in which case each tree is a stump, consisting of a single split. In this case, the boosted ensemble is fitting an additive model, since each term involves only a single variable.

Answer 50

False. In a bagged model, an average of about 2/3 of the observations are used to train each bagged tree.

Answer 51

False. For random forests, the fewer predictors are considered at each split, the greater the decorrelation benefit.

Answer 52

True. This is due to the random forest model only considering a random subset of predictors at each split, preventing a strong predictor from dominating the top split and producing similar bagged trees that result in limited variance reduction.

Answer 53

False. In a boosted model, the interaction depth determines the number of splits in each tree.

Answer 54

False. In a boosted model, the trees are grown sequentially, with each tree fitted to the residuals of the previous tree, which creates dependency between the trees.

Answer 55

False. To apply bagging to regression trees, we simply construct B regression trees using α multiple bootstrapped training sets, and average the resulting predictions.

Answer 56

True. Recursive binary splitting creates a hierarchical partitioning of the predictor space, where each split further subdivides the predictor space into smaller regions.

Answer 57

False. Recursive binary splitting can only make orthogonal splits or splits that are aligned with the axes of the predictor space.

Answer 58

False. Building a large tree and pruning it back is preferable to setting a strict stopping criterion to build a small tree.

Answer 59

False. Cost complexity pruning produces a sequence of nested subtrees.

Answer 60

False. If cross-validation chooses a tuning parameter value that corresponds to a stump, then the stump is the best subtree.

Answer 61

False. In order for a split to produce two pure terminal nodes with the same predicted value, the node must be pure before the split, in which case the split would not happen as there would be no node purity improvement to be made.

Answer 62

True. The increased node purity that results from this split increases our confidence in the predicted value, especially if a test observation belongs to the purer of the two terminal nodes.

Answer 63

False. It does not decrease the classification error rate.

Answer 64

False. The split does decrease the Gini index.

Answer 65

False. The split does decrease the entropy.

Answer 66

False. They generally have worse predictive accuracy compared to other statistical methods.

Answer 67

True. Decision trees are prone to overfitting, especially when the trees are grown deep. In cases like this, they fit the training data well, but do not fit the test data well.

Answer 68

False. Identifying important variables in decision trees is straightforward, as these variables typically appear at the top of the tree.

Answer 69

False. The penalty increases as the number of terminal nodes in the subtree increases.

Answer 70

True. When building trees in bagging, a full set of predictors is considered at each split. However, only a subset of predictors was considered at each split in random forests. This makes trees less correlated and the average of the resulting trees more reliable.

Wronged Questions: Decision Trees Flashcards

(117 cards)