B08 Decision Trees Flashcards
Decision trees are part of a group of learners that use all
the data available to them on a first-come first-served
basis. As a result of this, they are known as __________
greedy
learners.
Decision Trees:
-Begins at ____ node.
-Passed through
_____ nodes.
-Data is split across
______ (outcomes).
-Ends at ______ (final decision).
Root Node
Decision Nodes
Branches
leaf/terminal node
Decision trees are built using a _______________
approach which splits data into subsets and then
recursively splits the subsets into even smaller subsets
until one or more stopping criteria are met. This
approach is also known as ___________.
recursive partitioning
divide and conquer
Some of the criteria that trigger a stop to the recursive
partitioning process include when:
-All data in a leaf node are of the ______.
-All _______ have been exhausted.
-A specified _________ has been met.
same class
features
tree size limit
For most decision tree algorithms, the decision about
which feature to split upon is usually made based on a
measure of impurity known as _______
entropy.
For decision trees, entropy is a quantification of the
_______________ within a set of class
values.
level of randomness or disorder
Entropy is highest when the split is \_\_\_\_\_. As one class dominates the other, entropy reduces to \_\_\_\_\_\_.
50-50
zero
To determine the optimal feature to split upon, the
decision tree algorithm calculates the change in
entropy that would result from a split on each possible
feature. This measure is known as _____________.
Information Gain (F)
________ is a modification of information gain that
reduces its bias on highly branching features by taking
into account the number and size of branches when
choosing a feature. It does this by normalizing information gain by the ____________ of a split.
Gain Ratio
Intrinsic information
Instead of Entropy, some decision tree algorithms use the __________ to determine the optimal feature to split upon. _______ is a measure of statistical dispersion
Gini impurity measure
Gini
Gini impurity goes from ___ to ____ (for an infinite number of
even partitions).
0 to 1
A split occurs at the _______ value for the Gini impurity
lowest
In R, the __________ uses entropy as a measure of
impurity, while the __________ uses Gini
C50
CART algorithm
Decision trees have a
tendency to _____ against
the training data.
overfit
To remediate this, the size of
the tree is reduced in order
for it to generalize better. This is known as ____
Pruning