L19 - Classification Trees Flashcards

1
Q

What is a decision tree?

A

A supervised classification (and sometimes regression) algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 2 common subtrees of decision trees?

A

Regression Tree
Classification Tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the goal of a classification tree? What rules does it follow to achieve this?

A

Goal : Make a classification based on simple decision rules that’s learnt from historic data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a classification tree and how does it work?

A
  • A set of conditional statements. New data follows a flow of branches based on their response to the conditionals in the tree.
  • A classification is made when the new data reaches a leaf. The lead is the class that will be assigned.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can a tree hold a mixture of data types?

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the steps for interpreting a classification tree?

A
  1. Compare the new data against the condition of the root node.
  2. Follow either the true or false branch.
  3. Repeat until a leaf is reached.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When building a classification tree, what is the process followed for choosing the root statement?

A

Using historic data, establish the relationships between features and the dependent variable.

For each feature, create a small tree such that statement is root, true and false branches lead to decision leaves.

The statement with the most pure leaves should be chosen. However if an impure leaf is encountered, we need to attend to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 2 methods in measuring leaf impurity?

A

Percentage : In which we calculate the true percentage via (true P’s / total P’s).

Gini’s impurity : Establish the Gini Impurity for all roots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 3 steps of Gini’s Impurity?

A
  1. For each feature being tested as root, find the Gini Impurity of the leaves of the feature-root test trees.
  2. Calculate the total GI for the feature via the equation Total GI = leftGI + rightGI
  3. Repeat for all features.
  4. Feature with lowest GI is set as the root of the classification tree.
  5. Repeat throughout to establish tree.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the equation of Gini’s impurity?

A

1 - ( (prob. of yes^2) - (prob. of no^2) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is pruning and why is it needed?

A

Pruning is a data compression technique that removes non-critical components of a classification tree.

It is needed to prevent overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What can cause a classification tree to be overfit?

A

Continuously attempting to remove impurities can cause overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is post-pruning?

A

Pruning occurs after the classification tree has been built.

Pruning is done to the point at which the cross-validated error is at a minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is pre-pruning?

A

Stop the tree building process before it produces leaves with very small samples. Thus, preventing overfitting.

Upon each splitting of the tree, the cross-validation error is measured. If not substantial increase, the tree creation is stopped.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly