07.a Decision Trees Flashcards
What is another name for a Decision Tree
Prediction Tree
Is the Decision Tree supervised or unsupervised machine learning
Supervised
What is the primary task performed by Classifiers
The primary task performed by classifiers is to assign class labels to new observations. The set of labels for classifiers is predetermined, unlike in clustering, which discovers the structure without a training set and allows the data scientist optionally to create and assign labels to the clusters.
What are the two fundamental classification methods
Decision Trees and Naive Bayes
Must the input variables to the decision tree be continuous or categorical?
The input variables to a decision tree can be either continuous or categorical.
What is the name of the shortest decision tree, one which has only a root node and leaf nodes
A Decision Stump
What are the names of the nodes beyond the root node in a decision tree
Leaf nodes (also known as terminal nodes) are at the end of the last branches on the tree. They represent class labels—the outcome of all the prior decisions. The path from the root to a leaf node contains a series of decisions made at various internal (decision) nodes.
What are the two types of decision tree
Classification trees
Used for discrete variables and binomial data output.
Regression trees
Used of continuous output such as predicted prices and probabilities.
What is meant by the depth of a node in a decision tree
The depth of a node is the number of steps required to reach the node from the root
What does a Classification Tree do
A Classification Tree will determine a set of logical if-then conditions to classify problems. For example discriminating between three types of flowers based on certain features.
What does a Regression Tree do
The Regression Tree is used when the target variable is numerical or continuous in nature. We fit a regression model to the target variable using each of the independent variables. Each split is made based on the sum of the squared error.
What is purity referring to in Decision Trees
The purity of a node is defined as its probability of the corresponding class. i.e. a Pure node is one in which 100% of the records meet the criteria. For example 100% of the records are female.
How is a Decision Tree trained
At each node the algorithm looks for the split of the records that reached that node which is the most “informative”
When does the Decision Tree algorithm stop
The algorithm constructs subtrees until one of the following criteria is met:
- All the leaf nodes in the tree satisfy the minimum purity threshold.
- The tree cannot be further split with the pre-set minimum purity threshold.
- Any other stopping criterion is satisfied (such as the maximum depth of the tree).
What are entropy methods in relation to Decision Trees
The entropy methods select the most informative attribute based on two basic measures:
- Entropy, which measures the impurity of an attribute
- Information gain, which measures the purity of an attribute