Decision Tree Regression Flashcards
What does CART stand for?
Classification Tress and Regression Trees
How is a leave formed during a regression tree?
Plotted data points (x) within a graph are split from one another
… this split is decided by the algorithm based on the information entropy
|. * xxx. * xxx
|. x. * xx. * xx
| xxx. * * xxx. x
—————————
- = Split line
Give some insight into what this information entropy looks for:
The algorithm looks for how to create more information in the graph….
Once it can’t create new information by splitting data apart, it stops splitting
Where does the decision tree part come in?
Once the algorithm has split the plotted data, it creates a decision tree to describe when it split the data:
E.g. Start off from where the algorithm first split the data…. e.g. At 10 on the x-axis
Then from there explain (add) the other splits
1st split: (x-axis) < 10 (where split is) / \ No / \ Yes (means split <10) / \ 2nd split: y-axis < 3 / \ Yes/ \ No / \ x-axis < 25
- = algorithm split
3rd split 1st split * x |. * xxx. * xxx 3 |. * ************ 2nd split |. x. * xxxx | xxx. * xxx. ————————— 0 10 20 30
How is a regression tree leave used to predict values of new data?
After split, the average of the data points in each leaf is calculated
- = algorithm split
3rd split 1st split * x |. * xxx. * xxx4 3 |. 4 * ************ 2nd split |. x. * xxxx | xxx. * xxx. 7(made up average) ————————— 0 10 20 30
If a new variable falls into any of the leaves (e.g. y = 9, 2) it’s given value will be that leaf’s group average (e.g 4, so y = 4)