Decision Tree Flashcards

1
Q

What is a Decision Tree?

A

A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It creates a model by splitting the data into subsets based on the feature that best separates the data at each decision point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Tree Structure of Decision Tree

A

A Decision Tree is structured like a flowchart:

Root Node: The first node that splits the data based on a feature.

Decision Nodes: Internal nodes where the data is split again based on a feature.

Leaf Nodes: Terminal nodes where the classification or predicted value is made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Big Idea Behind Decision Trees

A

Start with all data in a single node (root).

Choose the best feature to split the data into two groups.

Repeat the process for each new group until a stopping condition is met (e.g., pure nodes, max depth).

The final nodes (leaf nodes) give the predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How a Decision Tree Selects the Best Feature?

A

Step-by-Step Process to Find Best Feature:

Calculate impurity (Gini, Entropy, or MSE) of the parent node.

Try splitting on each feature at different thresholds.

Calculate impurity for child nodes after each split.

Compute impurity reduction

Choose the feature with highest impurity reduction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Splitting Criteria (Stopping Conditions)

A

A Decision Tree stops growing when:

Maximum depth is reached.

A node has pure data (all samples belong to one class).

The number of samples in a node is below a certain threshold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

expliain Overfitting & Pruning in decision trees

A

Overfitting & Pruning
Problem:
A deep tree can perfectly fit the training data but fail on new data (overfitting).

Solutions:
Pre-Pruning (Early Stopping)
Limit tree depth, set min samples per node.

Post-Pruning (Prune After Training)
Grow the tree fully, then remove weak splits based on performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Advantages & Disadvantages of Decision Trees

A

Advantages:
Easy to interpret and visualize.
Handles both numerical & categorical data.
No need for feature scaling.
Works well with large datasets.

Disadvantages:
Prone to overfitting if too deep.
Sensitive to noisy data.
Greedy algorithm (locally optimal, not always globally optimal).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Decision Tree - Quick Summary

A

1️⃣ Start with all data in the root node.
2️⃣ Select the best feature by checking impurity reduction (Gini, Entropy, or MSE).
3️⃣ Split the data into two child nodes based on the feature value.
4️⃣ Repeat the process recursively for each node.
5️⃣ Stop splitting when a stopping condition is met (max depth, pure nodes, min samples).
6️⃣ Overfitting can happen if the tree is too deep → Use pruning (pre/post).
7️⃣ Works for both classification & regression (Gini/Entropy for classification, MSE for regression).
8️⃣ No need for feature scaling (unlike SVM or Logistic Regression).
9️⃣ Easy to interpret but sensitive to noisy data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When to use gini impurity and entropy ?

A

When to Use Gini vs. Entropy (Simple Guide)
✅ Use Gini when:
You have a large dataset (many features & samples).
You need faster computation (avoids log function).
You don’t mind a slightly unbalanced tree.

✅ Use Entropy when:
You have a smaller dataset (fewer features).
You want more balanced splits (better generalization).
You don’t care about extra computation time.

Rule of Thumb:
More features → Gini (Faster, works well)
Fewer features → Entropy (Better balance, but slower)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly