B08 Decision Trees Flashcards

1
Q

Decision trees are part of a group of learners that use all
the data available to them on a first-come first-served
basis. As a result of this, they are known as __________

A

greedy

learners.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Decision Trees:

-Begins at ____ node.
-Passed through
_____ nodes.
-Data is split across
______ (outcomes).
-Ends at ______ (final decision).

A

Root Node
Decision Nodes
Branches
leaf/terminal node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Decision trees are built using a _______________
approach which splits data into subsets and then
recursively splits the subsets into even smaller subsets
until one or more stopping criteria are met. This
approach is also known as ___________.

A

recursive partitioning

divide and conquer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Some of the criteria that trigger a stop to the recursive
partitioning process include when:
-All data in a leaf node are of the ______.
-All _______ have been exhausted.
-A specified _________ has been met.

A

same class
features
tree size limit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For most decision tree algorithms, the decision about
which feature to split upon is usually made based on a
measure of impurity known as _______

A

entropy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For decision trees, entropy is a quantification of the
_______________ within a set of class
values.

A

level of randomness or disorder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
Entropy is highest when the split is \_\_\_\_\_. As one class
dominates the other, entropy reduces to \_\_\_\_\_\_.
A

50-50

zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

To determine the optimal feature to split upon, the
decision tree algorithm calculates the change in
entropy that would result from a split on each possible
feature. This measure is known as _____________.

A

Information Gain (F)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

________ is a modification of information gain that
reduces its bias on highly branching features by taking
into account the number and size of branches when
choosing a feature. It does this by normalizing information gain by the ____________ of a split.

A

Gain Ratio

Intrinsic information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Instead of Entropy, some decision tree algorithms use the __________ to determine the optimal feature to split upon. _______ is a measure of statistical dispersion

A

Gini impurity measure

Gini

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Gini impurity goes from ___ to ____ (for an infinite number of
even partitions).

A

0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A split occurs at the _______ value for the Gini impurity

A

lowest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In R, the __________ uses entropy as a measure of

impurity, while the __________ uses Gini

A

C50

CART algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Decision trees have a
tendency to _____ against
the training data.

A

overfit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

To remediate this, the size of
the tree is reduced in order
for it to generalize better. This is known as ____

A

Pruning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The __________________ is used to control the
size of the decision tree and to select the optimal tree
size.

A

complexity parameter (cp)

17
Q

When used for pre-pruning, if the cost of adding
another variable to the decision tree from the current
node is _______the value of the Complexity Parameter, then tree building does
not continue.

A

above

18
Q
For post-pruning, the cp
value that corresponds to
the \_\_\_\_\_\_\_\_\_ is used as the
threshold for pruning the
tree.
A

lowest cross-validation

error

19
Q

Strengths of Decision Trees?

A

-Does well on most problems.
-Handles numeric and nominal
features well.
-Does well with missing data.
-Ignores unimportant features.
-Useful for both large and small
datasets.
-Output is easy to understand.
-Efficient and low cost model.

20
Q

Weaknesses of Decision Trees?

A
-Splits biased towards features
with a large number of levels.
-Easy to overfit or underfit.
-Reliance on axis-parallel splits
is limiting.
-Small changes in data result in
large changes to decision logic.
-Large trees can be difficult to
interpret or understand.