MODULE 2 S3.2.1 Flashcards
Ensemble (Module)
These are methods that combine multiple machine learning models to create more powerful models.
Ensembles
Ensemble models
Random Forests
Gradient Boosted Decision Trees (Gradient Boosting Machine)
It is one of the ensemble methods that can avoid overfitting by combining multiple decision trees. It reduces overfitting by averaging trees that predict well
Random Forest
Random Forest predictions
Regression : average of predicted ____________
Classification : average of predicted ____________
values
probabilities
parameter for the number of trees to
build
n_estimators
To build a tree, we first take what is called a ___________ sample of our data.
bootstrap
Advantages of Random Forest
Mostly widely used algorithm in regression and classification.
Excellent performance, less burden or parameter tuning, no data scale required.
Large datasets can be applied.
Disadvantages of Random Forest
As many trees are created, detailed analysis is difficult, and the trees tends to get deeper.
Poor performance for large and sparse data More memory usage and slower training and prediction than linear models
Parameters of Random Forest
n_estimators
max_features
Another ensemble algorithm based on DecisiontreeRegressor; can be used for both classification and regression.
Gradient Boosted Regression Trees
(Gradient Boosting Machines)
In GBM, unlike random forest, _____________ is strongly applied instead of randomness.
pre-pruning
T/F GBM is slightly more parameter sensitive, slightly higher performance than random forest.
True
GBM loss functions
Regression : ________________ loss function
Classification: ________________ loss function
least squares error
logistic
Advantages of GBM
Use when you need to take more performance in random forests (xgboost for larger scales)
No need for feature scale adjustment and can be used for binary and continuous features
Disadvantages of GBM
Doesn’t work well for sparse high-dimensional data
Sensitive to parameters, takes longer training time