MODULE 2 S3.2.2 Flashcards by Deleted Deleted

Decision trees can grow many branches until each split is as __________ as possible.

pure

How well did you know this?

Not at all

Perfectly

____________________ are a popular model that mitigates this problem of overfitting in decision trees.

Random Forests

How well did you know this?

Not at all

Perfectly

Alternatives to Using a Decision Tree

Random forest
Gradient boosting machine
Support vector machine
Neural network

How well did you know this?

Not at all

Perfectly

It is a popular machine learning algorithm that merges the outputs of numerous decision trees to produce a single outcome.

It is also suitable for both classification and regression tasks.

Random Forest

How well did you know this?

Not at all

Perfectly

Random forest was first introduced by ____________ and ____________ in ________

Leo Breiman
Adele Cutler
2001

How well did you know this?

Not at all

Perfectly

The foundational ideas of Random forest date back to ________, when ___________ and _____________ proposed a method using randomized decision trees.

1993
Salzberg
Heath

How well did you know this?

Not at all

Perfectly

The first algorithm for random decision forests was created by ____________ in __________ using the random subspace method.

Tin Kam Ho
1995

How well did you know this?

Not at all

Perfectly

T/F Random Forest is suitable for classification tasks.

FALSE
Classification and Regression tasks

How well did you know this?

Not at all

Perfectly

Random Forest’s strength lies in its ability to handle ________________ and mitigate ______________

complex dataset
overfitting

How well did you know this?

Not at all

Perfectly

Random forest technique

Bagging

How well did you know this?

Not at all

Perfectly

A key concept of random forest that combines the predictions of several base estimators to improve generalizability and robustness.

Ensemble Learning

How well did you know this?

Not at all

Perfectly

A key concept of random forest which states that, at each split in the tree, a ransom subset of features is considered for splitting.

Random Feature Selection

How well did you know this?

Not at all

Perfectly

Two stages of Random Forest:

Bootstrapping Stage
Splitting Stage

How well did you know this?

Not at all

Perfectly

Considers all features to create different data samples (RF stage)

Bootstrapping Stage

How well did you know this?

Not at all

Perfectly

Considers a random subset of features at each split (RF stage)

Splitting Stage

How well did you know this?

Not at all

Perfectly

It is a method where multiple machine learning models are trained to solve the same problem and then combined to improve the final output.

Ensemble learning

How well did you know this?

Not at all

Perfectly

It is a collection of models is used to make predictions rather than an individual model.

Ensemble learning

How well did you know this?

Not at all

Perfectly

Two types of ensemble methods/techniques

Bagging
Boosting

How well did you know this?

Not at all

Perfectly

Ensemble Process

Bagging : _______________
Boosting : _______________

Parallel
Sequential

Bagging is known as _____________

Bootstrap Aggregation

It serves as the ensemble technique in the Random Forest algorithm. It is a method of generating a new dataset with a replacement from an existing dataset.

Bagging

One of the famous techniques used in Bagging is _______________

Random Forest

How does bagging work?

Sampling
Independent training
Aggregation

Prediction (Bagging Aggregation)

Classification : ______________
Regression : _______________

majority voting
average

Strengths of Bagging

Reduces Variance Parallelism

Another ensemble technique, but unlike Bagging, it focuses on reducing bias by sequentially training models.

Boosting

T/F In boosting, the models created are independent.

False

T/F In boosting, the models are not independent; they are built one after another, with each model trying to fix the mistakes of the previous ones.

True

It refers to the family of an algorithm that converts weak learners (base learners) to strong learners.

Boosting

classifiers that are correct only up to a small extent with the actual classification

weak learner

Weak learner = _______________ = ______________

Base learner Subtree

classifiers that are well correlated with the actual classification

strong learner

How does boosting work?

1. Sequential training 2. Weight adjustment 3. Final model

Strengths of Boosting

Reduces Bias Higher Accuracy

Two types of hyperparameters of random forest

Increase the Predictive Power Increase the Speed

Predictive power parameters

n_estimators max_features mini_sample_leaf criterion max_leaf nodes

Speed parameters

n_jobs random_state oob_score

RF hyperparameter that determines the number of trees the algorithm builds before averaging the predictions

n_estimators

RF hyperparameter that pertains to the maximum number of features random forest considers splitting a node

max_features

RF hyperparameter that determines the minimum number of leaves required to split an internal node

mini_sample_leaf

RF hyperparameter: How to split the node in each tree? (Entropy, Gini impurity, Log loss)

criterion

RF hyperparameter: Maximum leaf nodes in each tree

max_leaf_nodes

RF hyperparameter: It tel the engine how many processors it is allowed to use. Processors: 1 : ______________ -1 : ______________

n_jobs One processor No limit

RF hyperparameter: Controls randomness of the sample

random_state

RF hyperparameter: It is a random forest cross-validation method

oob_score (OOB : out of the bag)

MODULE 2 S3.2.2 Flashcards

Random Forest (Supplementary)