Section 6 Tree Based Methods Flashcards

Question 1

Q

Explain the root node

Answer

A

The root node is the topmost decision node in a tree, it’s the first decision and it corresponds to the best predictor.

Question 2

Q

Whats a decision node vs a leaf node

Answer

A

A decision node has two or more branches. These correspond to regions of an input variable.
A leaf node represents a classification statement.: Observations are classified according to the majority class.- terminal node

Question 3

Q

Explain how classification works for classification trees - what criteria are used

Answer

A

The classification of a data-point within a region (usually a terminal node) is given by the majority class (by the MAP rule). Want the regions to be as pure as possible.
Different metrics measure the purity of a region: Entropy and Gini Index

Question 4

Q

Explain entropy

Answer

A

Entropy quantifies the variance of a probability distribution. 0 means there is no uncertainty in the probability distribution of the region.

Question 5

Q

Explain gini index

Answer

A

Gini Index quantifies the homogeneity of a probability distribution. Larger means less homogeneous. 0 means all probability mass is assigned to one class.

Question 6

Q

What is the optimisation procedure for a classification tree?

Answer

A

Growing a tree corresponds to identify the optimal tree structure which involves minimising a function to maximise the homogeneity of the input-space regions. A greedy algorithm is used to implement this minimisation problem

Question 7

Q

Explain a greedy algorithm

Answer

A

These greedy search approaches to the tree building problem work by fixing node by node from the root down rather than attempting to simultaneously estimate the whole tree structure.

Question 8

Q

How can overfitting theoretically occur in a classification tree

Answer

A

In theory one could grow a tree until every leaf contains a single observation. However, this would lead to overfitting. So we want to control the size of the tree.

Question 9

Q

How can we control the size of a classification tree - what levers are available

Answer

A

Restricting the size of the smallest terminal node.
Restricting the minimum number of observations that can fall inside a node.
Increasing the minimum criterion gain to be obtained when splitting a region.
Restricting the maximal depth of the tree

Question 10

Q

Advantages of classification trees

Answer

A

Trees are easily explained, easy to interpret and display graphically.
Classification trees are very flexible and do not require particular pre-processing.
Can be easily used in conjunction with ensemble methods.

Question 11

Q

Downsides of classification trees

Answer

A

Trees suffer high variance: A small change in the input data can result in a very different series of splits and have a large impact on the classification performance.
hierarchical nature of the splitting process, the error propagates in a multiplicative way from the root node to the leaves.
Due to their great flexibility, classification trees suffer from overfitting: classifying observations well during the fitting process, but will perform poorly classifying new unseen observations

Question 12

Q

What is bagging

Answer

A

Bagging is a general-purpose procedure for reducing the variance of a statistical learning method.
The main idea is that averaging multiple predictions reduces variance, thus increasing accuracy.

Question 13

Q

Why is bagging used in tree based methods of supervised learning

Answer

A

In general, classification trees suffer from high variance. So if we fit a classification tree to different random splits of the data we could obtain quite different results.
To increase the prediction accuracy of a learning method take many training sets from the population, fit a separate prediction model on each training set, and average the resulting predictions - bagging

Question 14

Q

When bagging how much of bagged data is usually unique

Answer

A

63.2% unique observations

Question 15

Q

How is predictive accuracy assessed using bagging

Answer

A

Out of bag error is calculated
using the fitted classifer on the bagged data for each observation fo the out of bag sample, use the fitted classifier to produce a predicted class.
The majority vote is used again to produce the final classification of the out-of-bag data points
The average out-of-bag error is an estimate of the generalisation error.

Question 16

Q

Explain random forests

Answer

Study These Flashcards

A

Random forests provide an improvement over bagging by means of a small tweak that reduces the dependence among the trees.
The main idea is to use only a random subset of the predictor variables at each split of the classification tree fitting step.

Question 17

Q

What is the procedure for building random forests

Answer

Study These Flashcards

A

Set large number of bootsrap replications
For each B, sample N observations with replacement to be bagged data
Fit classification tree to this sample but at each split use a random subset of input variables size m
Use tree fitted to produce predictions for each observation
The majority vote is used to produce the final classification of data point

Question 18

Q

What is the procedure for assessing the predictive accuracy fo a random forest

Answer

Study These Flashcards

A

Take the out of bag sample observations and using the fitted classification tree on the training data, using a random subset of variables at each split, get “out of bag” observations predicted classes
The average out-of-bag error is an estimate of the generalisation error.

Question 19

Q

What’s decorrelating the trees

Answer

Study These Flashcards

A

Random forests overcome the problems posed by having one very strong predictor in the input variables by forcing each split to consider only a random subset of the input variables. Means that not all the collection of bagged trees will include this as the top split.
Makes the average of the resulting trees less variable and hence more reliable.

Question 20

Q

Evaluate random forests

Answer

Study These Flashcards

A

If we build classifiers on subsets of the variables, then they will behave more independently than if we build them on all of the data.
This increases diversity and averaging results across independent classifiers will be more stable than averaging results on dependent ones.
Random forests overcome the problems posed by having one very strong predictor by decorrelating the trees, making the average of the resulting trees less variable and hence more reliable.

Question 21

Q

What is the main hyperparameter to tune in random forests

Answer

Study These Flashcards

A

The number of variables considered for a split m has an effect on the predictive performance.
The larger the number of variables considered for a split, the more complex the ensemble will be (but more similar to bagging!).
A rule of thumb is to set m ≈ √V . and edit from there - optimal can be selected by cross validation

Question 22

Q

What other hyperparameters are evident to tune in random forests

Answer

Study These Flashcards

A

The number of trees affects the size of the ensemble. The training error converges to a minimum as the number of trees increases - generally make as big as possible according to computational budget.
Hyperparameters of the classification trees are key too. Size of trees is controlled by: Minimum size of terminal nodes, Maximum number of terminal nodes

Question 23

Q

Will random forest always outperform a classification tree both trained on the same data splits?

Answer

Study These Flashcards

A

Yes by construction of ensembles of B trees. Random forest classifier is gathering multiple classification trees. Its a collection of trees which immediately will improve predictive performance. Variance will reduce.

Section 6 Tree Based Methods Flashcards

(23 cards)