Decision Tree Regression Flashcards

1
Q

What does CART stand for?

A

Classification Tress and Regression Trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is a leave formed during a regression tree?

A

Plotted data points (x) within a graph are split from one another

… this split is decided by the algorithm based on the information entropy

|. * xxx. * xxx
|. x. * xx. * xx
| xxx. * * xxx. x
—————————

  • = Split line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give some insight into what this information entropy looks for:

A

The algorithm looks for how to create more information in the graph….

Once it can’t create new information by splitting data apart, it stops splitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where does the decision tree part come in?

A

Once the algorithm has split the plotted data, it creates a decision tree to describe when it split the data:

E.g. Start off from where the algorithm first split the data…. e.g. At 10 on the x-axis
Then from there explain (add) the other splits

1st split:        (x-axis) < 10 (where split is) 
                               /   \
                        No /     \ Yes (means split <10) 
                             /       \
2nd split: y-axis < 3
                         /   \
                  Yes/     \ No 
                       /       \ 
                            x-axis < 25
  • = algorithm split
                             3rd split 
           1st split      *     x
      |.         * xxx.   *   xxx
  3  |.         * ************  2nd split 
      |.   x.   * xxxx
      |  xxx. *   xxx.   
      —————————
     0        10     20     30
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is a regression tree leave used to predict values of new data?

A

After split, the average of the data points in each leaf is calculated

  • = algorithm split
                             3rd split 
           1st split      *     x
      |.         * xxx.   *   xxx4
  3  |.  4       * ************  2nd split 
      |.   x.   * xxxx
      |  xxx. *   xxx. 7(made up average) 
      —————————
     0        10     20     30 

If a new variable falls into any of the leaves (e.g. y = 9, 2) it’s given value will be that leaf’s group average (e.g 4, so y = 4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly