Chapter 13: Intro to Classification and Regression Trees (CART) Flashcards

1
Q

Decision trees are a popular ____ ____ ____ _ with a wide range of applications

A

supervised data mining technique

(If-Then)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a pure subset?

A

each leaf node contains cases w/ same value of target variable, and no need to further split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is a decision tree usually built?

Be specific

A

using partitioned data sets

Training, Test, Validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A subset with the highest degree of impurity is defined as:

A

When half the cases belong to one class and other half belong to the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of decision tree is generated when the target variable is binary?

A

classification trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the Gini impurity index measure?

What is “m” in the formula?

A

degree of impurity of a set of cases in a multiclass classification context

“m” is the number of classes of the target variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A large data set with many predictor variables will likely generate a very complex tree with many levels of decision nodes. As the number of partitions increases, the misclassification rate from the training data set will decrease and eventually reach ‘________’.

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following are correct descriptions of the elements of a decision tree?
- The top node of the decision tree is called the branch node
- The root node is the first variable to which a split value is applied
- The bottom nodes of the decision tree are called root nodes
- Branches often lead to interior nodes where more decision rules are applied

A
  • the root node is the first variable to which a split value is applied
  • branches often lead to interior nodes where more decision rules are applied
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In CART, which data set is used to optimize the complexity of the tree by “pruning” the full tree to a simpler tree that generalizes better to new data?

A

validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In classification trees, the target variable assumed a categorical value. In regression trees, the target variable assumes a ____ variable.

A

numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A simple approach to prune a classification tree is to reduce the misclassification rate in the validation data set by replacing a branch of the tree with a ____ node.

A

leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly