Section 6 Tree Based Methods Flashcards

1
Q

Explain the root node

A

The root node is the topmost decision node in a tree, it’s the first decision and it corresponds to the best predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Whats a decision node vs a leaf node

A

A decision node has two or more branches. These correspond to regions of an input variable.
A leaf node represents a classification statement.: Observations are classified according to the majority class.- terminal node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain how classification works for classification trees - what criteria are used

A

The classification of a data-point within a region (usually a terminal node) is given by the majority class (by the MAP rule). Want the regions to be as pure as possible.
Different metrics measure the purity of a region: Entropy and Gini Index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain entropy

A

Entropy quantifies the variance of a probability distribution. 0 means there is no uncertainty in the probability distribution of the region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain gini index

A

Gini Index quantifies the homogeneity of a probability distribution. Larger means less homogeneous. 0 means all probability mass is assigned to one class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the optimisation procedure for a classification tree?

A

Growing a tree corresponds to identify the optimal tree structure which involves minimising a function to maximise the homogeneity of the input-space regions. A greedy algorithm is used to implement this minimisation problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain a greedy algorithm

A

These greedy search approaches to the tree building problem work by fixing node by node from the root down rather than attempting to simultaneously estimate the whole tree structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can overfitting theoretically occur in a classification tree

A

In theory one could grow a tree until every leaf contains a single observation. However, this would lead to overfitting. So we want to control the size of the tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we control the size of a classification tree - what levers are available

A

Restricting the size of the smallest terminal node.
Restricting the minimum number of observations that can fall inside a node.
Increasing the minimum criterion gain to be obtained when splitting a region.
Restricting the maximal depth of the tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Advantages of classification trees

A

Trees are easily explained, easy to interpret and display graphically.
Classification trees are very flexible and do not require particular pre-processing.
Can be easily used in conjunction with ensemble methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Downsides of classification trees

A

Trees suffer high variance: A small change in the input data can result in a very different series of splits and have a large impact on the classification performance.
hierarchical nature of the splitting process, the error propagates in a multiplicative way from the root node to the leaves.
Due to their great flexibility, classification trees suffer from overfitting: classifying observations well during the fitting process, but will perform poorly classifying new unseen observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is bagging

A

Bagging is a general-purpose procedure for reducing the variance of a statistical learning method.
The main idea is that averaging multiple predictions reduces variance, thus increasing accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is bagging used in tree based methods of supervised learning

A

In general, classification trees suffer from high variance. So if we fit a classification tree to different random splits of the data we could obtain quite different results.
To increase the prediction accuracy of a learning method take many training sets from the population, fit a separate prediction model on each training set, and average the resulting predictions - bagging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When bagging how much of bagged data is usually unique

A

63.2% unique observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is predictive accuracy assessed using bagging

A

Out of bag error is calculated
using the fitted classifer on the bagged data for each observation fo the out of bag sample, use the fitted classifier to produce a predicted class.
The majority vote is used again to produce the final classification of the out-of-bag data points
The average out-of-bag error is an estimate of the generalisation error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain random forests

A

Random forests provide an improvement over bagging by means of a small tweak that reduces the dependence among the trees.
The main idea is to use only a random subset of the predictor variables at each split of the classification tree fitting step.

17
Q

What is the procedure for building random forests

A

Set large number of bootsrap replications
For each B, sample N observations with replacement to be bagged data
Fit classification tree to this sample but at each split use a random subset of input variables size m
Use tree fitted to produce predictions for each observation
The majority vote is used to produce the final classification of data point

18
Q

What is the procedure for assessing the predictive accuracy fo a random forest

A

Take the out of bag sample observations and using the fitted classification tree on the training data, using a random subset of variables at each split, get “out of bag” observations predicted classes
The average out-of-bag error is an estimate of the generalisation error.

19
Q

What’s decorrelating the trees

A

Random forests overcome the problems posed by having one very strong predictor in the input variables by forcing each split to consider only a random subset of the input variables. Means that not all the collection of bagged trees will include this as the top split.
Makes the average of the resulting trees less variable and hence more reliable.

20
Q

Evaluate random forests

A

If we build classifiers on subsets of the variables, then they will behave more independently than if we build them on all of the data.
This increases diversity and averaging results across independent classifiers will be more stable than averaging results on dependent ones.
Random forests overcome the problems posed by having one very strong predictor by decorrelating the trees, making the average of the resulting trees less variable and hence more reliable.

21
Q

What is the main hyperparameter to tune in random forests

A

The number of variables considered for a split m has an effect on the predictive performance.
The larger the number of variables considered for a split, the more complex the ensemble will be (but more similar to bagging!).
A rule of thumb is to set m ≈ √V . and edit from there - optimal can be selected by cross validation

22
Q

What other hyperparameters are evident to tune in random forests

A

The number of trees affects the size of the ensemble. The training error converges to a minimum as the number of trees increases - generally make as big as possible according to computational budget.
Hyperparameters of the classification trees are key too. Size of trees is controlled by: Minimum size of terminal nodes, Maximum number of terminal nodes

23
Q

Will random forest always outperform a classification tree both trained on the same data splits?

A

Yes by construction of ensembles of B trees. Random forest classifier is gathering multiple classification trees. Its a collection of trees which immediately will improve predictive performance. Variance will reduce.