Trees, Forests, and Ensemble Models Flashcards

Question 1

Q

What is a decision tree?

Answer

A

A set of recursive binary partitions of the data

Question 2

Q

Define greedy partitioning

Answer

A

We find the variable j and value s that minimise classification error for the whole data, and repeat process until convergence.

Question 3

Q

Write the formula for cost complexity

Answer

A

Check notes

Question 4

Q

Write down the proportion of observations of class k

Answer

A

Check notes

Question 5

Q

Define misclassification error

Answer

A

Check notes

Question 6

Q

Define Gini index

Answer

A

Check notes

Question 7

Q

Define cross entropy

Answer

A

Check notes

Question 8

Q

How is feature importance measured in decision trees?

Answer

A

The importance of a feature is computed as the (normalized) total reduction of the impurity criterion brought by that feature. It is also known as the Gini importance (or mean decreased impurity).

Question 9

Q

What is the main problem with decision trees?

Answer

A

High variance, instability. Can get different trees for the same data

Question 10

Q

Define bagging

Answer

A

Booststrap AGGregation: generates variations of training data (bootstrapping) and trains a model for each bootstrap sample - then averages predictions across models.

Question 11

Q

What is boosting?

Answer

A

A method that produces a series of weak classifiers. The predictions are then combined through a weighted majority vote to produce the final prediction.
A popular implementation, AdaBoost, modifies the data at each iteration, adding weights to misclassified samples.

Question 12

Q

What is the idea behind random forests?

Answer

A

To improve the variance reduction of bagging by reducing the correlation between trees

Question 13

Q

What is “random” about random forests?

Answer

A

They’re quirky hahaha lol
They also randomise data samples through bootstrapping and pick a random subset of features at each step

Question 14

Q

Define feature importance in random forests

Answer

A

the improvement in the split criterion at each split in the tree accumulated over all tree in the forest

Question 15

Q

What is gradient boosting?

Answer

A

A boosting algorithm where each tree learns the mistakes of the previous tree (residual fitting)