Trees, Forests, and Ensemble Models Flashcards

1
Q

What is a decision tree?

A

A set of recursive binary partitions of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define greedy partitioning

A

We find the variable j and value s that minimise classification error for the whole data, and repeat process until convergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Write the formula for cost complexity

A

Check notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Write down the proportion of observations of class k

A

Check notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define misclassification error

A

Check notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Gini index

A

Check notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define cross entropy

A

Check notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is feature importance measured in decision trees?

A

The importance of a feature is computed as the (normalized) total reduction of the impurity criterion brought by that feature. It is also known as the Gini importance (or mean decreased impurity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the main problem with decision trees?

A

High variance, instability. Can get different trees for the same data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define bagging

A

Booststrap AGGregation: generates variations of training data (bootstrapping) and trains a model for each bootstrap sample - then averages predictions across models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is boosting?

A

A method that produces a series of weak classifiers. The predictions are then combined through a weighted majority vote to produce the final prediction.
A popular implementation, AdaBoost, modifies the data at each iteration, adding weights to misclassified samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the idea behind random forests?

A

To improve the variance reduction of bagging by reducing the correlation between trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is “random” about random forests?

A

They’re quirky hahaha lol
They also randomise data samples through bootstrapping and pick a random subset of features at each step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define feature importance in random forests

A

the improvement in the split criterion at each split in the tree accumulated over all tree in the forest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is gradient boosting?

A

A boosting algorithm where each tree learns the mistakes of the previous tree (residual fitting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly