MODULE 2 S3.2.2 Flashcards

Random Forest (Supplementary)

1
Q

Decision trees can grow many branches until each split is as __________ as possible.

A

pure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

____________________ are a popular model that mitigates this problem of overfitting in decision trees.

A

Random Forests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Alternatives to Using a Decision Tree

A

Random forest
Gradient boosting machine
Support vector machine
Neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

It is a popular machine learning algorithm that merges the outputs of numerous decision trees to produce a single outcome.

It is also suitable for both classification and regression tasks.

A

Random Forest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Random forest was first introduced by ____________ and ____________ in ________

A

Leo Breiman
Adele Cutler
2001

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The foundational ideas of Random forest date back to ________, when ___________ and _____________ proposed a method using randomized decision trees.

A

1993
Salzberg
Heath

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The first algorithm for random decision forests was created by ____________ in __________ using the random subspace method.

A

Tin Kam Ho
1995

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F Random Forest is suitable for classification tasks.

A

FALSE
Classification and Regression tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Random Forest’s strength lies in its ability to handle ________________ and mitigate ______________

A

complex dataset
overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Random forest technique

A

Bagging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A key concept of random forest that combines the predictions of several base estimators to improve generalizability and robustness.

A

Ensemble Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A key concept of random forest which states that, at each split in the tree, a ransom subset of features is considered for splitting.

A

Random Feature Selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Two stages of Random Forest:

A

Bootstrapping Stage
Splitting Stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Considers all features to create different data samples (RF stage)

A

Bootstrapping Stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Considers a random subset of features at each split (RF stage)

A

Splitting Stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

It is a method where multiple machine learning models are trained to solve the same problem and then combined to improve the final output.

A

Ensemble learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

It is a collection of models is used to make predictions rather than an individual model.

A

Ensemble learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Two types of ensemble methods/techniques

A

Bagging
Boosting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Ensemble Process

Bagging : _______________
Boosting : _______________

A

Parallel
Sequential

20
Q

Bagging is known as _____________

A

Bootstrap Aggregation

21
Q

It serves as the ensemble technique in the Random Forest algorithm. It is a method of generating a new dataset with a replacement from an existing dataset.

22
Q

One of the famous techniques used in Bagging is _______________

A

Random Forest

23
Q

How does bagging work?

A
  1. Sampling
  2. Independent training
  3. Aggregation
24
Q

Prediction (Bagging Aggregation)

Classification : ______________
Regression : _______________

A

majority voting
average

25
Strengths of Bagging
Reduces Variance Parallelism
26
Another ensemble technique, but unlike Bagging, it focuses on reducing bias by sequentially training models.
Boosting
27
T/F In boosting, the models created are independent.
False
28
T/F In boosting, the models are not independent; they are built one after another, with each model trying to fix the mistakes of the previous ones.
True
29
It refers to the family of an algorithm that converts weak learners (base learners) to strong learners.
Boosting
30
classifiers that are correct only up to a small extent with the actual classification
weak learner
31
Weak learner = _______________ = ______________
Base learner Subtree
32
classifiers that are well correlated with the actual classification
strong learner
33
How does boosting work?
1. Sequential training 2. Weight adjustment 3. Final model
34
Strengths of Boosting
Reduces Bias Higher Accuracy
35
Two types of hyperparameters of random forest
Increase the Predictive Power Increase the Speed
36
Predictive power parameters
n_estimators max_features mini_sample_leaf criterion max_leaf nodes
37
Speed parameters
n_jobs random_state oob_score
38
RF hyperparameter that determines the number of trees the algorithm builds before averaging the predictions
n_estimators
39
RF hyperparameter that pertains to the maximum number of features random forest considers splitting a node
max_features
40
RF hyperparameter that determines the minimum number of leaves required to split an internal node
mini_sample_leaf
41
RF hyperparameter: How to split the node in each tree? (Entropy, Gini impurity, Log loss)
criterion
42
RF hyperparameter: Maximum leaf nodes in each tree
max_leaf_nodes
43
RF hyperparameter: It tel the engine how many processors it is allowed to use. Processors: 1 : ______________ -1 : ______________
n_jobs One processor No limit
44
RF hyperparameter: Controls randomness of the sample
random_state
45
RF hyperparameter: It is a random forest cross-validation method
oob_score (OOB : out of the bag)