5: Decision Trees Flashcards

Question 1

Q

What are decision trees?

Answer

A

Decision trees represent a group of classification techniques that are based on the construction of a tree like structure. This structure is a series of steps, where each step uses the given features one by one to help classify the input object.

Question 2

Q

Where are decision trees used?

Answer

A

Image processing and character recognition, medicine, financial analysis, astronomy, manufacturing, production, and molecular biology.

Question 3

Q

Are decision trees SL or UL?

Answer

A

SL, since they use labeled training instances to construct a classifier

Question 4

Q

How does DT work?

Answer

A

TOP-DOWN PROCESS. 1. Select the highest ranked feature create the decision node
2. From this node, create the branches with distinct value (range)
− If all instances of this feature value (range) are of the same class:
the child node from this branch is a leaf node
− else:
repeat step 1 and 2

Question 5

Q

What is the structure of DT?

Answer

A

Nodes (decision based on features), Branches (conditional statements IF), Leaves (classes)

Question 6

Q

What happens to datasets that contain more than one feature?

Answer

A

For a dataset that contains more than one feature, the decision tree classifier uses a ranking technique to detect their degree of importance to the given classification problem. Accordingly, the classifier selects the most salient feature for representing the root node and then the remaining features in decreasing importance for the rest of the tree nodes.

Question 7

Q

How does the complexity of decision rules affect the interpretability and size of a decision tree?

Answer

A

A decision tree uses a tree structure to represent decision rules, which makes it easy for experts to understand the reasons behind classifications. However, as the tree adds more rules, it needs more training data, and if there are many features, the rules become more complex. This added complexity can make the tree harder to interpret, reducing its value as a visual tool.

Question 8

Q

What is underfitting?

Answer

A

If the classifying model is not trained enough, the inducted decision tree is going to be too simple to classify instances accurately.

Question 9

Q

When is a DT model succesfull?

Answer

A

When it is able to generalize.

Question 10

Q

What are some challanges with DT?

Answer

A

May include branches that represent outliers or noise in the input dataset.

Question 11

Q

What are the benefits?

Answer

A

-easy to interpret due to the natural representation (svm and neural networks are blaxk box classifiers where the decision logic is unknown) - independent from the statistical distribution of the input data - relationship between the features and the class lables can be nonlinear

Question 12

Q

What is pruning?

Answer

A

handles overfitting by decreasing the size of the tree to make it less
complex
− Method: Removing sub-trees in the decision tree that have low
classification power

Question 13

Q

What are the two types of pruning?

Answer

A

Pre-pruning: avoids building up the low-discriminating sub-trees while the
decision tree is being constructed, and replaces with leaf nodes
Post-pruning: removes spurious sub-trees from the fully constructed decision
tree, and replaces with leaf nodes

Question 14

Q

What are the most popular methods for DT?

Answer

A

ID3, C4.5 and CART (differ in feature selection and how the pruning mechanism is used)

Question 15

Q

What techniques do the mechanism use?

Answer

A

ID3 –> information gain, C4.5 –> gain ratio technique, CART –> Gini index technique

Question 16

Q

Explain how ID3 and Information Gain work.

Answer

Study These Flashcards

A

**Iterative Dichotomiser (ID3) uses information gain (IG) to
select the best splitting features ** ID3 measures the degree of homogeneity of the classes
induced by a decision node
*** IG is based on the entropy, which measures the randomness or disorder of the classes before and after splitting on a feature. If a split makes the groups more organized (less random), then IG is high, meaning it’s a good split.

So, the lower the entropy after the split, the higher the IG. That’s why we say IG is “inversely proportional” to entropy: as entropy goes down, IG goes up, making it a better choice for a split.

Question 17

Q

What types of features does ID3 can deal with?

Answer

Study These Flashcards

A

Discrete features only. ID3 can be applied for regression problems just by using standard deviation reduction
instead of IG

Question 18

Q

What is the standard deviation?

Answer

Study These Flashcards

A

A measure of the degree of variation in a set of numerical values. A feature vector of similar values is considered homogenous. The standard deviation of a completely homogenous feature vector is zero.

Question 19

Q

Explain how C4.5 and Gain Ratio work.

Answer

Study These Flashcards

A

C4.5 handles the problem of generalization when applying IG for the datasets with
high homogeneity
− It can deal with both continuous and discrete features
− Method: normalizing the information gain:

Question 20

Q

Explain Generalization and why it might be better than IG.

Answer

Study These Flashcards

A

Information Gain (IG) favors features with distinct values, as they often create purer groups. For example, if we use an identifier feature (like a unique ID), each ID is completely distinct, so each split will perfectly separate the data, making entropy zero. This would give a high IG score, but using an ID to split data isn’t useful for generalizing because it only separates based on unique labels without learning patterns.

The C4.5 algorithm fixes this by adjusting IG, normalizing it to prevent features like IDs from dominating splits. This adjusted ranking helps the decision tree focus on features that improve generalization rather than just creating pure splits.

Question 21

Q

What are some advantages of the C4.5 algorithm in decision tree classification?

Answer

Study These Flashcards

A

The C4.5 can deal with both continuous and discrete features. It handles missing values and applies tree pruning after the process of the tree induction.

Question 22

Q

How doe CART and Gini Index work?

Answer

Study These Flashcards

A

CART uses Gini Index 𝐺𝑖𝑛𝑖(𝐷) to measure the impurity in a dataset 𝐷. The feature that maximizes the impurity reduction ∆𝐺𝑖𝑛𝑖(𝑓) is selected as an important feature.

Question 23

Q

How does pre-pruning work?

Answer

Study These Flashcards

A

Pre-pruning stops a decision tree from growing too complex by avoiding branches that don’t add much value. When a certain condition is met, the tree-building process stops adding new decision points and instead creates a “leaf” with the most common class label for that branch. The specific condition for stopping depends on a ranking measure, like information gain, gain ratio, or Gini index. If this measure is too low, meaning the split won’t be useful enough, then no further splits are made.

Question 24

Q

How does post-pruning work?

Answer

Study These Flashcards

A

Post-pruning simplifies a fully built decision tree by cutting out unnecessary branches and replacing them with a single leaf showing the most common class. The CART method does this by calculating “cost complexity” for each branch, based on how many leaves it has and its error rate. If replacing a branch with a single leaf reduces complexity without hurting accuracy, that branch is removed.

5: Decision Trees Flashcards

(24 cards)