Decision Trees Flashcards

1
Q

What does CART stand for?

A

Classification & Regression Trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a CART model do?

A

Classify or predict an outcome based on:
a set of predictor variables producing a tree structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does a classification tree split?

A

Splits maximize homogeneity of classes in each branch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does a regression tree split?

A

Splits minimize variance within each branch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 4 key features of decision trees?

A
  1. Binary splits
  2. Pruning
  3. Non-parametric
  4. Interpretability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is binary splits?

A

Each split produces 2 child nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is pruning?

A

Prevents tree from being grown to minimze complexity & reduce overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is non-parametric?

A

No assumapitons about the underlying distribution of each variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is interpretability?

A

Easy to understand visualize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is a tree produced?

A

Recursive partitioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is recursive partitioning?

A

Repeatedly split the records into 2 parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In recursive partitioning, why does a split need to have into two parts?

A

To achieve maximum homogeneity of outcome within each new part

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two ways to measure impurity?

A
  1. Gini Index
  2. Entropy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are two ways to avoid overfitting?

A
  1. Pre pruning
  2. Post pruning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 5 ways to avoid overfitting when pre-pruning?

A
  1. Limit max depth
  2. Min samples for split
  3. Min samples per leaf
  4. Threshold for splitting
  5. Restrict feature usage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 2 ways to avoid overfitting when post-pruning?

A

Cost complexity pruning
Cross validation

17
Q

What are the 5 pros of decision trees?

A
  1. Easy to understand
  2. Handle numerical & categorical
  3. No need for normalization
  4. Handle categorical (binary and multi class)
  5. Not a black box model
18
Q

What are the 2 cons of decision trees?

A
  1. Easy to be too complex

2.Unstable to small variations