Lecture 4 - Tree models Flashcards

1
Q

What are the three models based on trees

A
  1. decision trees
  2. random forest
  3. gradient boosting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three parts of a decision tree

A
  1. root node
  2. internal/decision nodes
  3. leaf nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the max depth and max number of leaves of a decision tree with d number of binary features?

A

d+1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how many leafs does a decision tree have with d number of binary features

A

2^d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Decision tree

The set of literals at a node is called a ___

A

split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

each leaf of the tree represents a ___ ___, which is a conjuction of literals encounteresd on the path from the root of the tree to the leaf

A

logical expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give 3 decision tree algrithms

A
  1. ID3
  2. C4.5
  3. CART
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Growing a tree is recusive
true/false

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how to decide the best split

A

to assign the majority class label to leaves, we look for a clear split between two classes -> Purity of the children of the split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Give 3 impurity metrics

A
  1. Minority class
  2. Entropy
  3. Gini index
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Both the entropy and the Gini index are smooth and concave upper bounds of the training error. These properties can be advantegeous in some situations

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is entropy

A

Entropy is the expected average level of information, surprise, or uncertainty inherent in all the n possible outcomes of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Entropy for a binary classification tasks:

A

H(p,1-p) = -p log_2 p-(1-p)log_2 (1-p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we assess if a split is useful at all?

A

In assessing the quality of a feature for splitting a parent node D into leaves, it is customary to look at the purity gain. Imp(D)-Imp(D1..Dj)
Purity gain = original entropy - entropy after splitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Finding the best split for a decision tree is recursive?

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ho

How to prevent overfitting in decision trees. Give 2

A
  • Limit the number of iterations of the algorithm leading to a tree with a bounded number of nodes.
  • Prune the tree after it is built by removing weak branches.
17
Q

How to do Reduced Error Pruning

A
  1. Starting at the leaves, each node is replaced with the majority class
  2. If the predictino accuracy is not affected then the change is kept.
  3. Keep a validatin set, see pruned tree performance on the validation set. Of course the pruning will not improve accuracy on the training set.
18
Q

What are the two sources of imballance

A
  • Asymmetric class distribution
  • Asymmetric mis-classification cost
19
Q

What does adding more samples accomplish?

A

Adds data for training and increases the training time
It is possible that it does not make any difference

20
Q

what is sqrt(gini)

A

Sqrt(gini) is designed to minimise relative impurity and this is insensitive to changes in class distribution, whereas Gini emphasises children covering more examples

21
Q

Entropy and Gini index are sensitive to fluctuations in the class distrubutions, sqrt(gini) isn’t. We want distribution-insensitive impurities.

A

yeah true

22
Q

In regression trees we can replace imp with ____

A

variance

23
Q

What is weighted variance

A

If a split partions the set of target values Y into mutually exclusive sets {y1,…yj}

24
Q

The variance of boolean variable with success probability p is p(1-p), which is half of the gini index. So we could interpret the goal of tree learning as minimising the calss variance in the leaves.

A

true

25
Q

In regression trees out goal is

A

to find a split that minimises the weighted average of the variance of the child nodes.

26
Q
A