L8: Tree Based Methods Flashcards
- Differentiate between tree-based methods and the other methods - Understand the advantages and disadvantages of trees - Generate more powerful prediction model with bagging, random forest and boosting.
What are the two types of basic decision trees?
Regression and classification decision trees
Given a new house, can we predict its house price?
Use the variables and follow the decision tree to find the house price average
In classical regression, we train a model such that the predictions result in the minimum RSS.
What is the method in tree-based models?
In tree-based models, the recursive binary splitting that cuts the data is done:
Select the predictor Xj and the cutpoint s such that splitting the predictor space into regions >/< s leads to the greatest reduction in RSS
Repeat this process, looking for the best predictor and best cutpoint in order to split the data further so that the RSS can be minimised
In classical regression, we train a model such that the predictions result in the minimum RSS.
What is the method in tree-based models?
In tree-based models, the recursive binary splitting that cuts the data is done:
Select the predictor Xj and the cutpoint s such that splitting the predictor space into regions >/< s leads to the greatest reduction in RSS
Repeat this process, looking for the best predictor and best cutpoint in order to split the data further so that the RSS can be minimised
Problem: if we continued to split the data until every end node was a single datapoint, we’d have a highly complex tree model.
What might arise from this?
A model that is over-fitted
A model with a poor test error
A model that does not generalise well
How can we ensure that tree models are not too complex?
We can use tree pruning to reduce the # of terminal nodes of a tree model.
The nonnegative tuning parameter alpha is used to penalise the model for having too many terminal nodes.
So, what are the two methods in which a tree model can be tuned?
- Bias variance trade-off
- Complexity cost pruning
If we have an observation with “worst radius” = 15.4, “worst concave points” = 0.265, “worst texture” = 17.33, is the tumour malignant or benign?
The cancer is likely to be benign.
Sketch out the classification tree
Answer in ma book
What is the Gini index? What does it allow?
The Gini index is a measure of total variance across the K classes. It will be small if all of the classes are the same type.
It is a measure of the node purity - a small value indicates that a node contains predominantly observations from a single class.
What is bagging?
Bagging is bootstrapping.
In decision trees, it’s where multiple samples of the dataset are created. Then multiple decision trees are created to model these samples. Then the classifiers are combined to create a much stronger and robust decision tree.
What is boosting?
Gradient Boosting is the creation of sequential decision trees that is based on the errors of the previous tree.
In this way, the decision tree learns from previous trees until all errors are removed.