MODULE 2 S3.2.1 Flashcards

Ensemble (Module)

1
Q

These are methods that combine multiple machine learning models to create more powerful models.

A

Ensembles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ensemble models

A

Random Forests
Gradient Boosted Decision Trees (Gradient Boosting Machine)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

It is one of the ensemble methods that can avoid overfitting by combining multiple decision trees. It reduces overfitting by averaging trees that predict well

A

Random Forest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Random Forest predictions

Regression : average of predicted ____________
Classification : average of predicted ____________

A

values
probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

parameter for the number of trees to
build

A

n_estimators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

To build a tree, we first take what is called a ___________ sample of our data.

A

bootstrap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Advantages of Random Forest

A

Mostly widely used algorithm in regression and classification.
Excellent performance, less burden or parameter tuning, no data scale required.
Large datasets can be applied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Disadvantages of Random Forest

A

As many trees are created, detailed analysis is difficult, and the trees tends to get deeper.
Poor performance for large and sparse data More memory usage and slower training and prediction than linear models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Parameters of Random Forest

A

n_estimators
max_features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Another ensemble algorithm based on DecisiontreeRegressor; can be used for both classification and regression.

A

Gradient Boosted Regression Trees
(Gradient Boosting Machines)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In GBM, unlike random forest, _____________ is strongly applied instead of randomness.

A

pre-pruning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

T/F GBM is slightly more parameter sensitive, slightly higher performance than random forest.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

GBM loss functions

Regression : ________________ loss function
Classification: ________________ loss function

A

least squares error
logistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Advantages of GBM

A

Use when you need to take more performance in random forests (xgboost for larger scales)
No need for feature scale adjustment and can be used for binary and continuous features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Disadvantages of GBM

A

Doesn’t work well for sparse high-dimensional data
Sensitive to parameters, takes longer training time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Parametes of GBM

A

n_estimators
learning rate
max_depth (<=5)