L8: Tree Based Methods Flashcards

Question 1

Q

What are the two types of basic decision trees?

Answer

A

Regression and classification decision trees

Question 2

Q

Given a new house, can we predict its house price?

Answer

A

Use the variables and follow the decision tree to find the house price average

Question 3

Q

In classical regression, we train a model such that the predictions result in the minimum RSS.

What is the method in tree-based models?

Answer

A

In tree-based models, the recursive binary splitting that cuts the data is done:

Select the predictor Xj and the cutpoint s such that splitting the predictor space into regions >/< s leads to the greatest reduction in RSS

Repeat this process, looking for the best predictor and best cutpoint in order to split the data further so that the RSS can be minimised

Question 4

Q

In classical regression, we train a model such that the predictions result in the minimum RSS.

What is the method in tree-based models?

Answer

A

In tree-based models, the recursive binary splitting that cuts the data is done:

Select the predictor Xj and the cutpoint s such that splitting the predictor space into regions >/< s leads to the greatest reduction in RSS

Repeat this process, looking for the best predictor and best cutpoint in order to split the data further so that the RSS can be minimised

Question 5

Q

Problem: if we continued to split the data until every end node was a single datapoint, we’d have a highly complex tree model.

What might arise from this?

Answer

A

A model that is over-fitted

A model with a poor test error

A model that does not generalise well

Question 6

Q

How can we ensure that tree models are not too complex?

Answer

A

We can use tree pruning to reduce the # of terminal nodes of a tree model.

The nonnegative tuning parameter alpha is used to penalise the model for having too many terminal nodes.

Question 7

Q

So, what are the two methods in which a tree model can be tuned?

Answer

A

Bias variance trade-off
Complexity cost pruning

Question 8

Q

If we have an observation with “worst radius” = 15.4, “worst concave points” = 0.265, “worst texture” = 17.33, is the tumour malignant or benign?

Answer

A

The cancer is likely to be benign.

Question 9

Q

Sketch out the classification tree

Answer

A

Answer in ma book

Question 10

Q

What is the Gini index? What does it allow?

Answer

A

The Gini index is a measure of total variance across the K classes. It will be small if all of the classes are the same type.

It is a measure of the node purity - a small value indicates that a node contains predominantly observations from a single class.

Question 11

Q

What is bagging?

Answer

A

Bagging is bootstrapping.

In decision trees, it’s where multiple samples of the dataset are created. Then multiple decision trees are created to model these samples. Then the classifiers are combined to create a much stronger and robust decision tree.

Question 12

Q

What is boosting?

Answer

A

Gradient Boosting is the creation of sequential decision trees that is based on the errors of the previous tree.

In this way, the decision tree learns from previous trees until all errors are removed.

L8: Tree Based Methods Flashcards

- Differentiate between tree-based methods and the other methods - Understand the advantages and disadvantages of trees - Generate more powerful prediction model with bagging, random forest and boosting. (12 cards)