Ensemble learning and random forests Flashcards

Question 1

Q

If you have trained five different models on the exact same training data, and they all achieve 95% precision, is there any chance that you can combine these models to get better results? If so, how? if not why?

Answer

A

You can try combining them into a voting ensemble, which will often give you even better results. It works better if the models are very different. It is even better if they are trained on different training instances. It is generally effect as long as the models are very different.

Question 2

Q

What is the difference between hard and soft voting classifiers?

Answer

A

A hard voting classifier just counts the votes of each classifier in the ensemble and picks the class that gets the most votes.

A soft voting classifier computes the average estimated class probability for each class and picks the class with highest probability. However, it work only if every classifier is able to estimate class probabilities.

Question 3

Q

Is it possible to speed up training of a bagging ensemble by distributing it accross multiple servers?

Answer

A

It is quite possible to speed up training of a bagging ensemble by distributing it across multiples servers, since each predictor in the ensemble is independant of the others.

Question 4

Q

What is the benefit of out-of-bag evaluation?

Answer

A

With out-of-bag evaluation, each prediction in a bagging ensemble is evaluated using instances that it was not trained on. This makes it possible to have a fairly unbiased evaluation of the ensemble without the need for an addition validation set. Thus, you have more instances available for training, and your ensemble can perform slightly better.

Question 5

Q

What makes Extra-Trees more random than regular random forest? how can this extra randomness help? are extra trees slower or faster than regular random forests?

Answer

A

When you are growing a tree in a random forest, only a random subset of the features is considered for splitting at each node. This is true as well for extra-trees but they go one step further: rather than searching for the best possible thresholds, like regular decision trees do, they use random thresholds for each feature. This extra randomness acts like a form of regularization: if a random forest overfits the training data, extra-trees might perform better. Moreover, since extra-trees do not search for the best possible thresholds, they are much faster to train than random forest. When making predictions they are the same speed.

Question 6

Q

If your adaboost ensemble underfits the training data, which hyperparameters should you tweak and how?

Answer

A

You can try increasing the number of estimators or reducing the regularization hyperparameters of the base estimator. You may also try slightly increasing the learning rate.

Question 7

Q

If your gradient boosting ensemble overfits the training set, should you increase or decrease the learning rate?

Answer

A

If your gradient boosting ensemble overfits the training set, you should try decreasing the learning rate. You could also use early stopping to find the right number of predictors (you probably have too many).

Question 8

Q

What is the different between a hard voting classifier and a soft voting classifier?

Answer

A

In hard voting (also known as majority voting), every individual classifier votes for a class, and the majority wins. In statistical terms, the predicted target label of the ensemble is the mode of the distribution of individually predicted labels.

In soft voting, every individual classifier provides a probability value that a specific data point belongs to a particular target class. The predictions are weighted by the classifier’s importance and summed up. Then the target label with the greatest sum of weighted probabilities wins the vote.

Question 9

Q

What is the main idea behind Bagging (or Bootstrap aggregating)? What the difference with pasting ?

Answer

A

Given a standard training set D of size n, bagging generates m new training sets D(i), each of size n′, by sampling from D uniformly and with replacement. By sampling with replacement, some observations may be repeated in each D(i). If n′=n, then for large n the set D(i) is expected to have the fraction (1 - 1/e) (≈63.2%) of the unique examples of D, the rest being duplicates. This kind of sample is known as a bootstrap sample. Then, m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification).

When sampling is performed without replacement, it is called Pasting.

Question 10

Q

What is the main idea with out-of-bag evaluation?

Answer

A

When using bagging, only about 63% of the training instances are sampled on average for each predictor. The remaining 37% of the training instances that are not sampled are called out-of-bag (oob) instances. Note that they are not the same 37% for all predictors.

Since a predictor never sees the oob instances during training, it can be evaluated on these instance, without the need for a separate validation set. You can evaluate the ensemble itself by averaging out the oob evaluation of each predictor.

Question 11

Q

What is the idea of ensemble learning?

Answer

A

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.

Question 12

Q

What is a random forests?

Answer

A

A random Forest is an ensemble of Decision Trees, generally trained via the bagging method.

In sklearn we can directly use the RandomForestClassifier class, which is more convenient and optimized for Decision Trees.

Question 13

Q

How does sklearn measures a feature’s importance using random forest?

Answer

A

Its look at how much the tree nodes that use that feature reduce impurity on average. More precisely, it is a weighted average, where each node’s weight is equal to the number of training samples that are associated with it.

Question 14

Q

What is the main idea behind boosting?

Answer

A

Boosting refers to any Ensemble method that can combine several weak learners into a strong learner. The general idea of most boosting methods is to train predictors sequentially, each trying to correct its predecessor.

There are many boosting methods available, but by far the most popular are Adaboost and gradient boosting.

Question 15

Q

What is the main idea of adaboost? What is a drawback of adaboost?

Answer

A

When training an Adaboost classifier, the algorithm first trains a base classifier (such as a decision tree) and uses it to make predictions on the training set. The algorithm then increases the relative weight of misclassified training instances. Then it trains a second classifier, using the updated weights, and again make predictions on the training set, updates the instances weights, and so on.

AdaBoost is sensitive to noisy data and outliers.

Question 16

Q

What is the main idea of gradient boosting ?

Answer

A

Like other boosting methods, gradient boosting combines weak “learners” into a single strong learner in an iterative fashion. The goal is to teach a model F to predict values of the form yp = F(x) by minimizing a cost function. At each stage of gradient boosting, it may be assumed that there is some imperfect model F_m. The gradient boosting algorithm improves on F_m by constructing a new model that adds an estimator h to provide a better model:

F_m+1= F_m(x)+h(x) = y

Therefore, gradient boosting will fit h to the residual y-F_m(x).

Question 17

Q

What is XGBoost?

Answer

A

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. It has the best execution speed and best model performance.

note that algorithm won multiple kaggle competition.