8 Trees Flashcards

Question 1

Q

Define a decision tree in one sentence.

Answer

A

A non‑parametric model that recursively partitions the feature space into axis‑aligned regions and fits a constant prediction in each region.

Question 2

Q

What two stopping criteria are most common when growing a tree?

Answer

A

Minimum number of observations in a node and maximum tree depth (or minimum impurity decrease).

Question 3

Q

Why are decision trees considered high‑variance models?

Answer

A

Small changes in the data can drastically alter top‑level splits, and those errors propagate down the tree.

Question 4

Q

Bagging: list the three core steps.

Answer

A

1) Draw B bootstrap samples, 2) train one full tree per sample, 3) average (regression) or vote (classification) predictions.

Question 5

Q

What is the expected % of unique observations left out of any bootstrap sample (≃OOB set)?

Answer

A

About 37 % (since P(not selected) ≈ e^{‑1}).

Question 6

Q

State the main purpose of Out‑of‑Bag (OOB) error.

Answer

A

Provides an internal, cross‑validation–like estimate of test error without a separate validation set.

Question 7

Q

Random Forest: key extra randomisation beyond bagging?

Answer

A

At each split, consider only a random subset of features to choose the best split.

Question 8

Q

Effect of reducing the mtry (feature subset size) in Random Forest?

Answer

A

Lower correlation between trees → greater variance reduction but slightly higher bias.

Question 9

Q

Why does boosting fit each new tree to residuals/gradients?

Answer

A

To correct the mistakes of the current ensemble, moving predictions in the direction of steepest loss decrease.

Question 10

Q

Give the additive model form produced by gradient boosting.

Answer

A

ŷ(x)=∑_{m=1}^M α_m f_m(x), where each f_m is a weak tree and α_m a shrinkage weight.

Question 11

Q

What is the role of the ‘learning rate’ (shrinkage) in boosting?

Answer

A

Scales each tree’s contribution; smaller rates require more trees but improve generalisation by reducing over‑fitting.

Question 12

Q

Contrast AdaBoost vs XGBoost in one sentence.

Answer

A

AdaBoost re‑weights observations to minimise exponential loss, whereas XGBoost fits trees to first/second‑order gradients of a chosen loss with regularisation and subsampling.

Question 13

Q

True or False: XGBoost always uses depth‑1 stumps as weak learners.

Answer

A

False – it typically uses depth‑3–8 trees; depth is tuneable.

Question 14

Q

List two advantages of Random Forests over a single deep tree.

Answer

A

Lower variance and built‑in OOB error estimate (also handles many features robustly).

Question 15

Q

When would you prefer boosting over Random Forests?

Answer

A

When the dataset is small-to‑medium, complex patterns exist, and you can afford careful hyper‑parameter tuning for maximal accuracy.

Question 16

Q

What global explainability tool is built into Random Forests?

Answer

A

Feature‑importance scores based on total impurity reduction.

Question 17

Q

What does TreeSHAP guarantee about the sum of feature contributions for an instance?

Answer

A

They add up exactly to the difference between the instance’s prediction and the model’s overall expected prediction (local additivity).

Question 18

Q

Give one disadvantage of AdaBoost on noisy datasets.

Answer

A

The exponential loss heavily up‑weights mislabeled/noisy observations, leading to overfitting.

Question 19

Q

Name two hyper‑parameters that regularise XGBoost.

Answer

A

Learning rate (η) and L1/L2 penalties on leaf weights (λ_L1, λ_L2).

Question 20

Q

Quick rule: how many features (mtry) are considered per split for classification RF by default?

Answer

A

⌈√p⌉ where p is the total number of features.