MODULE 2 S3.2.1 Flashcards

Question 1

Q

These are methods that combine multiple machine learning models to create more powerful models.

Answer

A

Ensembles

Question 2

Q

Ensemble models

Answer

A

Random Forests
Gradient Boosted Decision Trees (Gradient Boosting Machine)

Question 3

Q

It is one of the ensemble methods that can avoid overfitting by combining multiple decision trees. It reduces overfitting by averaging trees that predict well

Answer

A

Random Forest

Question 4

Q

Random Forest predictions

Regression : average of predicted ____________
Classification : average of predicted ____________

Answer

A

values
probabilities

Question 5

Q

parameter for the number of trees to
build

Answer

A

n_estimators

Question 6

Q

To build a tree, we first take what is called a ___________ sample of our data.

Answer

A

bootstrap

Question 7

Q

Advantages of Random Forest

Answer

A

Mostly widely used algorithm in regression and classification.
Excellent performance, less burden or parameter tuning, no data scale required.
Large datasets can be applied.

Question 8

Q

Disadvantages of Random Forest

Answer

A

As many trees are created, detailed analysis is difficult, and the trees tends to get deeper.
Poor performance for large and sparse data More memory usage and slower training and prediction than linear models

Question 9

Q

Parameters of Random Forest

Answer

A

n_estimators
max_features

Question 10

Q

Another ensemble algorithm based on DecisiontreeRegressor; can be used for both classification and regression.

Answer

A

Gradient Boosted Regression Trees
(Gradient Boosting Machines)

Question 11

Q

In GBM, unlike random forest, _____________ is strongly applied instead of randomness.

Answer

A

pre-pruning

Question 12

Q

T/F GBM is slightly more parameter sensitive, slightly higher performance than random forest.

Question 13

Q

GBM loss functions

Regression : ________________ loss function
Classification: ________________ loss function

Answer

A

least squares error
logistic

Question 14

Q

Advantages of GBM

Answer

A

Use when you need to take more performance in random forests (xgboost for larger scales)
No need for feature scale adjustment and can be used for binary and continuous features

Question 15

Q

Disadvantages of GBM

Answer

A

Doesn’t work well for sparse high-dimensional data
Sensitive to parameters, takes longer training time

Question 16

Q

Parametes of GBM

Answer

A

n_estimators
learning rate
max_depth (<=5)

MODULE 2 S3.2.1 Flashcards

Ensemble (Module)