Decision Tree Flashcards

Question 1

Q

What is a Decision Tree?

Answer

A

A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It creates a model by splitting the data into subsets based on the feature that best separates the data at each decision point.

Question 2

Q

Tree Structure of Decision Tree

Answer

A

A Decision Tree is structured like a flowchart:

Root Node: The first node that splits the data based on a feature.

Decision Nodes: Internal nodes where the data is split again based on a feature.

Leaf Nodes: Terminal nodes where the classification or predicted value is made.

Question 3

Q

Big Idea Behind Decision Trees

Answer

A

Start with all data in a single node (root).

Choose the best feature to split the data into two groups.

Repeat the process for each new group until a stopping condition is met (e.g., pure nodes, max depth).

The final nodes (leaf nodes) give the predictions.

Question 4

Q

How a Decision Tree Selects the Best Feature?

Answer

A

Step-by-Step Process to Find Best Feature:

Calculate impurity (Gini, Entropy, or MSE) of the parent node.

Try splitting on each feature at different thresholds.

Calculate impurity for child nodes after each split.

Compute impurity reduction

Choose the feature with highest impurity reduction.

Question 5

Q

Splitting Criteria (Stopping Conditions)

Answer

A

A Decision Tree stops growing when:

Maximum depth is reached.

A node has pure data (all samples belong to one class).

The number of samples in a node is below a certain threshold.

Question 6

Q

expliain Overfitting & Pruning in decision trees

Answer

A

Overfitting & Pruning
Problem:
A deep tree can perfectly fit the training data but fail on new data (overfitting).

Solutions:
Pre-Pruning (Early Stopping)
Limit tree depth, set min samples per node.

Post-Pruning (Prune After Training)
Grow the tree fully, then remove weak splits based on performance.

Question 7

Q

Advantages & Disadvantages of Decision Trees

Answer

A

Advantages:
Easy to interpret and visualize.
Handles both numerical & categorical data.
No need for feature scaling.
Works well with large datasets.

Disadvantages:
Prone to overfitting if too deep.
Sensitive to noisy data.
Greedy algorithm (locally optimal, not always globally optimal).

Question 8

Q

Decision Tree - Quick Summary

Answer

A

1️⃣ Start with all data in the root node.
2️⃣ Select the best feature by checking impurity reduction (Gini, Entropy, or MSE).
3️⃣ Split the data into two child nodes based on the feature value.
4️⃣ Repeat the process recursively for each node.
5️⃣ Stop splitting when a stopping condition is met (max depth, pure nodes, min samples).
6️⃣ Overfitting can happen if the tree is too deep → Use pruning (pre/post).
7️⃣ Works for both classification & regression (Gini/Entropy for classification, MSE for regression).
8️⃣ No need for feature scaling (unlike SVM or Logistic Regression).
9️⃣ Easy to interpret but sensitive to noisy data.

Question 9

Q

When to use gini impurity and entropy ?

Answer

A

When to Use Gini vs. Entropy (Simple Guide)
✅ Use Gini when:
You have a large dataset (many features & samples).
You need faster computation (avoids log function).
You don’t mind a slightly unbalanced tree.

✅ Use Entropy when:
You have a smaller dataset (fewer features).
You want more balanced splits (better generalization).
You don’t care about extra computation time.

Rule of Thumb:
More features → Gini (Faster, works well)
Fewer features → Entropy (Better balance, but slower)