Topic 3: Decision Trees Flashcards
What is the key metric used by the ID3 algorithm to select attributes for splitting?
Information Gain.
What is entropy in the context of decision trees?
A measure of uncertainty or impurity in the dataset.
How does pruning help in decision tree learning?
By reducing overfitting and removing irrelevant parts of the tree.
What is a decision tree?
A flowchart-like structure where internal nodes represent tests on attributes, branches represent outcomes, and leaf nodes represent class labels or decisions.
How does a decision tree classify an instance?
By traversing from the root to a leaf node, following the path defined by the instance’s feature values.
What type of problems can decision trees handle?
Both classification (discrete outcomes) and regression (continuous outcomes).
What is information gain?
A measure of how much uncertainty in the dataset is reduced after splitting on a particular attribute.
What are the advantages of decision trees?
Easy to interpret and visualize.
Handles both numerical and categorical data.
Requires little preprocessing.
What are the disadvantages of decision trees?
Prone to overfitting without pruning.
Can be unstable due to small changes in the data.
Less effective for large datasets with many classes or noise.
What is pruning in decision trees?
A process of reducing the size of the tree by removing sections that provide little value to avoid overfitting.
What is the difference between pre-pruning and post-pruning?
Pre-pruning stops the tree’s growth early, based on a condition (e.g., max depth).
Post-pruning removes branches after the tree is fully grown.
How can cross-validation be used in pruning?
To determine whether removing a branch improves performance on unseen data.
What causes overfitting in decision trees?
The tree grows too complex and fits the training data perfectly, capturing noise and irrelevant patterns.
How are decision trees used in medical diagnosis?
To classify diseases based on symptoms or test results.