Chapter 23 Ensemble Algorithms Flashcards

1
Q

What’s a bootstrap sample?

P 295

A

It involves first selecting random samples from the training dataset with replacement, meaning that a given sample may contain zero, one, or more than one copy of examples in the training dataset. This is called a bootstrap sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The process of creating new bootstrap samples and fitting and adding trees to the sample can continue until no further improvement is seen in the ensemble’s performance on a validation dataset. This simple procedure often results in better performance than a single well-configured decision tree algorithm. True/False

P 295

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

An easy way to overcome class imbalance problem when facing the resampling stage in bagging is: ____

P 296

A

to take the classes of the instances into account when they are randomly drawn from the original dataset.

Basically in cost-sensitive ensemble, we balance the training set, before bootstrapping it for use in the ensemble model, it’s like a wrapper around the ensemble model, making it cost-sensitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Oversampling the minority class in the bootstrap is referred to as ____ ; likewise, undersampling the majority class in the bootstrap is referred to as ____ , and combining
both approaches is referred to as ____.

P 297

A

OverBagging
UnderBagging
OverUnderBagging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The imbalanced-learn library provides an implementation of UnderBagging. Specifically, it provides a version of bagging that uses a random undersampling strategy on the majority class within a bootstrap sample in order to balance the two classes. This is provided in the ____ class.

P 297

A

BalancedBaggingClassifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the differences and similarities between decision trees and random forest?

P 298

A

Random forest is another ensemble of decision tree models and may be considered an improvement upon bagging. Like bagging, random forest involves selecting bootstrap samples from the training dataset and fitting a decision tree on each.
The main difference is that all features (variables or columns) are not used; instead, a small, randomly selected subset of features (columns) is chosen for each bootstrap sample. This has the effect of de-correlating the decision trees (making them more independent), and in turn, improving the ensemble prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is the class_weight set in RandomForestClassifier, when this argument’s value is ‘balanced_subsample’?

P 300

A

Given that each decision tree is constructed from a bootstrap sample (e.g. random selection with replacement), the class distribution in the data sample will be different for each tree. As such, it might be interesting to change the class weighting based on the class distribution in each bootstrap sample in Random Forest, instead of the entire training dataset. This can be achieved by setting the class weight argument to the value ‘balanced_subsample’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What’s the difference between using class_weight=”balanced” and class_weight=”balanced_subsample” in RandomForestClassifier?

P 300

A

class_weight=”balanced_subsample” changes the class weighting based on the class distribution in each bootstrap sample, class_weight=”balanced” does this for the entire training dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does Balanced Random Forest model from imblearn library do to fix the data imbalance problem?

P 301

A

Another useful modification to random forest is to perform data sampling on the bootstrap sample in order to explicitly change the class distribution. The BalancedRandomForestClassifier class from the imbalanced-learn library implements this and performs random undersampling of the majority class in each bootstrap sample. This is generally referred to as Balanced Random Forest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Easy Ensemble work?

P 303

A

The Easy Ensemble involves creating balanced samples of the training dataset by selecting all examples from the minority class and a subset from the majority class. Rather than using pruned decision trees, boosted decision trees are used on each subset, specifically the AdaBoost algorithm.
The process can be repeated multiple times and the average prediction across the ensemble of models can be used to make predictions.

Although an AdaBoost classifier is used on each subsample, alternate classifier models can be used via setting the base estimator argument to the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Under-sampling is an efficient strategy to deal with class-imbalance. However, the drawback of under-sampling is that it throws away many potentially useful data. How is this problem avoided in Easy Ensemble (or Balanced Random Forest and BalancedBaggingClassifier) in which the subsamples are under-sampled?

P 302

A

The generation of multiple subsamples allows the ensemble to overcome the downside of undersampling in which valuable information is discarded from the training process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Easy Ensemble combines bagging and boosting for imbalanced classification. True/False

P 305

A

True

It uses the result of multiple adaboost base estimatiors to return the final result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Is balancing the training data done before bootstrapping in BalancedBaggingClassifier?

External

How well did you know this?
1
Not at all
2
3
4
5
Perfectly