MODULE 2 S3.2.2 Flashcards

Random Forest (Supplementary)

1
Q

Decision trees can grow many branches until each split is as __________ as possible.

A

pure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

____________________ are a popular model that mitigates this problem of overfitting in decision trees.

A

Random Forests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Alternatives to Using a Decision Tree

A

Random forest
Gradient boosting machine
Support vector machine
Neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

It is a popular machine learning algorithm that merges the outputs of numerous decision trees to produce a single outcome.

It is also suitable for both classification and regression tasks.

A

Random Forest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Random forest was first introduced by ____________ and ____________ in ________

A

Leo Breiman
Adele Cutler
2001

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The foundational ideas of Random forest date back to ________, when ___________ and _____________ proposed a method using randomized decision trees.

A

1993
Salzberg
Heath

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The first algorithm for random decision forests was created by ____________ in __________ using the random subspace method.

A

Tin Kam Ho
1995

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F Random Forest is suitable for classification tasks.

A

FALSE
Classification and Regression tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Random Forest’s strength lies in its ability to handle ________________ and mitigate ______________

A

complex dataset
overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Random forest technique

A

Bagging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A key concept of random forest that combines the predictions of several base estimators to improve generalizability and robustness.

A

Ensemble Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A key concept of random forest which states that, at each split in the tree, a ransom subset of features is considered for splitting.

A

Random Feature Selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Two stages of Random Forest:

A

Bootstrapping Stage
Splitting Stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Considers all features to create different data samples (RF stage)

A

Bootstrapping Stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Considers a random subset of features at each split (RF stage)

A

Splitting Stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

It is a method where multiple machine learning models are trained to solve the same problem and then combined to improve the final output.

A

Ensemble learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

It is a collection of models is used to make predictions rather than an individual model.

A

Ensemble learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Two types of ensemble methods/techniques

A

Bagging
Boosting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Ensemble Process

Bagging : _______________
Boosting : _______________

A

Parallel
Sequential

20
Q

Bagging is known as _____________

A

Bootstrap Aggregation

21
Q

It serves as the ensemble technique in the Random Forest algorithm. It is a method of generating a new dataset with a replacement from an existing dataset.

A

Bagging

22
Q

One of the famous techniques used in Bagging is _______________

A

Random Forest

23
Q

How does bagging work?

A
  1. Sampling
  2. Independent training
  3. Aggregation
24
Q

Prediction (Bagging Aggregation)

Classification : ______________
Regression : _______________

A

majority voting
average

25
Q

Strengths of Bagging

A

Reduces Variance
Parallelism

26
Q

Another ensemble technique, but unlike Bagging, it focuses on reducing bias by sequentially training models.

A

Boosting

27
Q

T/F In boosting, the models created are independent.

A

False

28
Q

T/F In boosting, the models are not independent; they are built one after another, with each model trying to fix the mistakes of the previous ones.

A

True

29
Q

It refers to the family of an algorithm that converts weak learners (base learners) to strong learners.

A

Boosting

30
Q

classifiers that are correct only up to a small extent with the actual classification

A

weak learner

31
Q

Weak learner = _______________ = ______________

A

Base learner
Subtree

32
Q

classifiers that are well correlated with the actual classification

A

strong learner

33
Q

How does boosting work?

A
  1. Sequential training
  2. Weight adjustment
  3. Final model
34
Q

Strengths of Boosting

A

Reduces Bias
Higher Accuracy

35
Q

Two types of hyperparameters of random forest

A

Increase the Predictive Power
Increase the Speed

36
Q

Predictive power parameters

A

n_estimators
max_features
mini_sample_leaf
criterion
max_leaf nodes

37
Q

Speed parameters

A

n_jobs
random_state
oob_score

38
Q

RF hyperparameter that determines the number of trees the algorithm builds before averaging the predictions

A

n_estimators

39
Q

RF hyperparameter that pertains to the maximum number of features random forest considers splitting a node

A

max_features

40
Q

RF hyperparameter that determines the minimum number of leaves required to split an internal node

A

mini_sample_leaf

41
Q

RF hyperparameter:
How to split the node in each tree? (Entropy, Gini impurity, Log loss)

A

criterion

42
Q

RF hyperparameter:
Maximum leaf nodes in each tree

A

max_leaf_nodes

43
Q

RF hyperparameter:
It tel the engine how many processors it is allowed to use.

Processors:
1 : ______________
-1 : ______________

A

n_jobs

One processor
No limit

44
Q

RF hyperparameter:
Controls randomness of the sample

A

random_state

45
Q

RF hyperparameter:
It is a random forest cross-validation method

A

oob_score
(OOB : out of the bag)