lecture 9: decision trees and random forest Flashcards

Question 1

Q

what can decision trees be compared to visually?

Answer

A

a flowchart algorithm

Question 2

Q

how do we make predictions using decision trees more accurate?

Answer

A

increase the level of depth

Question 3

Q

what are the 3 types of nodes

Answer

A

root node, decision node and terminal node/leaf

Question 4

Q

what does the depth number represent?

Answer

A

the number of decisions needed to go from the root node to the nodes at that depth

Question 5

Q

what is classification tree learning

Answer

A

the construction of a classification tree given training data

Question 6

Q

we want to to obtain the least complex tree possible while having low training error, how do we do so?

Answer

A

use some greedy algorithm not guaranteed to find the best tree

Question 7

Q

what can we aim for such that the classification tree has low training error

Answer

A

node purity

Question 8

Q

what are the three node impurity measures

Answer

A

gini, entropy and misclassification rate

Question 9

Q

how do we calculate the gini impurity in each node

Answer

A

let pᵢ = the fraction of class i data samples in the node\
let k = number of classes
Qᵤ = node impurity in node u
= 1 - ∑[1 to k] pᵢ²

Question 10

Q

how do we calculate the overall gini impurity at a said depth n

Answer

A

for j nodes at depth n

gini impurity = ∑[1 to j] fraction of data samples in node x Q(gini impurity if node)

Question 11

Q

what is the depth at the root node

Question 12

Q

what are the disadvantages of decision trees

Answer

A

trees can become overly complex leading to overfitting

trees can be unstable which means small changes in training data results in very diff trees

Question 13

Q

what are the methods to reduce overfitting

Answer

A

set max depth for tree
set minimum number of samples for a lead node
set minimum decrease in impurity
split by looking at a subset of features instead of all the features

Question 14

Q

how can we use decision trees for regression problems

Answer

A

instead of minimising some impurity measure, seek to minimise some loss function like mse instead

Question 15

Q

how to reduce instability

Answer

A

average predictions from a certain number of trees

- random forest

Question 16

Q

what does random forest do?

Answer

A

bootstrap the data and then train the trees then average