lecture 9: decision trees and random forest Flashcards

1
Q

what can decision trees be compared to visually?

A

a flowchart algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how do we make predictions using decision trees more accurate?

A

increase the level of depth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the 3 types of nodes

A

root node, decision node and terminal node/leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does the depth number represent?

A

the number of decisions needed to go from the root node to the nodes at that depth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is classification tree learning

A

the construction of a classification tree given training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

we want to to obtain the least complex tree possible while having low training error, how do we do so?

A

use some greedy algorithm not guaranteed to find the best tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what can we aim for such that the classification tree has low training error

A

node purity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the three node impurity measures

A

gini, entropy and misclassification rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do we calculate the gini impurity in each node

A

let pᵢ = the fraction of class i data samples in the node\
let k = number of classes
Qᵤ = node impurity in node u
= 1 - ∑[1 to k] pᵢ²

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how do we calculate the overall gini impurity at a said depth n

A

for j nodes at depth n

gini impurity = ∑[1 to j] fraction of data samples in node x Q(gini impurity if node)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the depth at the root node

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are the disadvantages of decision trees

A

trees can become overly complex leading to overfitting

trees can be unstable which means small changes in training data results in very diff trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the methods to reduce overfitting

A
  • set max depth for tree
  • set minimum number of samples for a lead node
  • set minimum decrease in impurity
  • split by looking at a subset of features instead of all the features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how can we use decision trees for regression problems

A

instead of minimising some impurity measure, seek to minimise some loss function like mse instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how to reduce instability

A
  • average predictions from a certain number of trees

- random forest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does random forest do?

A

bootstrap the data and then train the trees then average