Chapter 13: Intro to Classification and Regression Trees (CART) Flashcards

Question 1

Q

Decision trees are a popular ____ ____ ____ _ with a wide range of applications

Answer

A

supervised data mining technique

(If-Then)

Question 2

Q

What is a pure subset?

Answer

A

each leaf node contains cases w/ same value of target variable, and no need to further split

Question 3

Q

How is a decision tree usually built?

Be specific

Answer

A

using partitioned data sets

Training, Test, Validation

Question 4

Q

A subset with the highest degree of impurity is defined as:

Answer

A

When half the cases belong to one class and other half belong to the other

Question 5

Q

What type of decision tree is generated when the target variable is binary?

Answer

A

classification trees

Question 6

Q

What does the Gini impurity index measure?

What is “m” in the formula?

Answer

A

degree of impurity of a set of cases in a multiclass classification context

“m” is the number of classes of the target variable

Question 7

Q

A large data set with many predictor variables will likely generate a very complex tree with many levels of decision nodes. As the number of partitions increases, the misclassification rate from the training data set will decrease and eventually reach ‘________’.

Question 8

Q

Which of the following are correct descriptions of the elements of a decision tree?
- The top node of the decision tree is called the branch node
- The root node is the first variable to which a split value is applied
- The bottom nodes of the decision tree are called root nodes
- Branches often lead to interior nodes where more decision rules are applied

Answer

A

the root node is the first variable to which a split value is applied
branches often lead to interior nodes where more decision rules are applied

Question 9

Q

In CART, which data set is used to optimize the complexity of the tree by “pruning” the full tree to a simpler tree that generalizes better to new data?

Answer

A

validation

Question 10

Q

In classification trees, the target variable assumed a categorical value. In regression trees, the target variable assumes a ____ variable.

Answer

A

numerical

Question 11

Q

A simple approach to prune a classification tree is to reduce the misclassification rate in the validation data set by replacing a branch of the tree with a ____ node.

Chapter 13: Intro to Classification and Regression Trees (CART) Flashcards

(11 cards)