Chapter 13: Intro to Classification and Regression Trees (CART) Flashcards
Decision trees are a popular ____ ____ ____ _ with a wide range of applications
supervised data mining technique
(If-Then)
What is a pure subset?
each leaf node contains cases w/ same value of target variable, and no need to further split
How is a decision tree usually built?
Be specific
using partitioned data sets
Training, Test, Validation
A subset with the highest degree of impurity is defined as:
When half the cases belong to one class and other half belong to the other
What type of decision tree is generated when the target variable is binary?
classification trees
What does the Gini impurity index measure?
What is “m” in the formula?
degree of impurity of a set of cases in a multiclass classification context
“m” is the number of classes of the target variable
A large data set with many predictor variables will likely generate a very complex tree with many levels of decision nodes. As the number of partitions increases, the misclassification rate from the training data set will decrease and eventually reach ‘________’.
0
Which of the following are correct descriptions of the elements of a decision tree?
- The top node of the decision tree is called the branch node
- The root node is the first variable to which a split value is applied
- The bottom nodes of the decision tree are called root nodes
- Branches often lead to interior nodes where more decision rules are applied
- the root node is the first variable to which a split value is applied
- branches often lead to interior nodes where more decision rules are applied
In CART, which data set is used to optimize the complexity of the tree by “pruning” the full tree to a simpler tree that generalizes better to new data?
validation
In classification trees, the target variable assumed a categorical value. In regression trees, the target variable assumes a ____ variable.
numerical
A simple approach to prune a classification tree is to reduce the misclassification rate in the validation data set by replacing a branch of the tree with a ____ node.
leaf