Decision Trees Flashcards

Question 1

Q

What does the min_sample_split parameter does in a DecisionTreeCassifier

Answer

A

It says the minimum number of samples needed to keep spliting a branch

Question 2

Q

How is min_samples_split related to overfitting?

Answer

A

Smaller values of min_samples_split tend to overfit to test data and give more complex decision boundaries.

Question 3

Q

What is the entropy in decision trees?

Answer

A

It’s a measure of impurity of the divisions, the DT will try divisions as pure as possible ( with less entropy).
Entropy is what controls how a DT decides where to split the data.

Question 4

Q

Which values Entropy can lay on?

Answer

A

Entropy can take values between 0 and 1.

0 -> Low entropy, all data from the same class.
1 -> 50%/50% proportion of the classes.

Question 5

Q

What is Information gain and how is it used in Decision Trees?

Answer

A

Decision Tree uses the information gain as a criterion to decide which feature use to make a split.

IG= entropy(parent)-weighted_average(entropy(children))

It's value depends on the number of classes. 
Two classes: Max entropy is 1.
Four Classes: Max entropy is 2.
Eight Classes: Max entropy is 3.
16 classes: Max entropy is 4.

And the feature chosen to make the split is the one that yields a higher IG.

Question 6

Q

What is the Gini Index and how is it used in Decision Trees?

Answer

A

Decision Tree uses the Gini Index as a criterion to decide which feature use to make a split. You would choose the partition with lower Gini Index.

Gini Index:

Favors larger partitions.
Uses squared proportion of classes.
Perfectly classified, Gini Index would be zero.
Evenly distributed would be 1 – (1/# Classes).
You want a variable split that has a low Gini Index.

Question 7

Q

Explain briefly how a RandomForest work

Answer

A

A random forest is an ensemble of multiple decision trees. These are the steps:

1) Make a Bootrstrapped dataset.
2) Build a DecisionTree limiting the number of features (by a parameter given) at random. For example, if there are 4 features, and the parameter is 2, we would choose 2 features at random.
3) For each next branch, choose other two features at random without considering previous level used feature.

4)Repeat steps 1 to 4 many times.

Question 8

Q

What does Bagging means in Random Forests (Classification)?

Answer

A

It means Bootstrapping the data for each decision tree and later choosing by aggregating the decision of each tree and choosing the decision with most votes.

Question 9

Q

Describe how does Adaboost work

Answer

A

Adaboost is similar to a Random forest but with three considerations:

It doesn’t build full trees, but limits the depth to one level. (These trees are called stumps)
Each stump is influenced by the errors of the previous stumps, so the order matters.
Stumps generally have different weights in the final decision.