Esemble Classifier Flashcards

Question 1

Q

Why combining classifiers?

Answer

A

stuck with the bias inherent in a given algorithm if only use one

Question 2

Q

ensemble learning

Answer

A

constructs a set of base classifiers from a given set of
training data and aggregates the outputs into a single
meta-classifier so that:

• the combination of lots of weak classifiers can
be at least as good as one strong classifier
• the combination of a selection of strong
classifiers is (usually) at least as good as the best of the
base classifiers

Question 3

Q

voting

Answer

A

• for a nominal class set, run multiple base classifiers over the test data and select the class predicted by the most base classifiers

• for a continuous class set, average over the numeric
predictions of our base classifiers

Question 4

Q

Approaches to Classifier Combination

Answer

A

Instance manipulation (most common)
Feature manipulation (most common)
Class label manipulation
Algorithm manipulation

Question 5

Q

Instance manipulation

Answer

A

generate multiple training

datasets through sampling, and train a base classifier over each

Question 6

Q

Feature manipulation

Answer

A

generate multiple training

datasets through different feature subsets, and train a base classifier over each

Question 7

Q

Class label manipulation

Answer

A

generate multiple training
datasets by manipulating the class labels in a ireversible
manner

Question 8

Q

Algorithm workspaces

Answer

A

semi-randomly “tweak” internal parameters within a given algorithm to generate multiple base classifiers over a given datasetquiz

Question 9

Q

4 popular esemble methods

Answer

A

stacking
bagging
random forest
Boosting

Question 10

Q

Stacking

Answer

A

Basic intuition: “smooth” errors over a range of algorithms with different biases

• Method 1: simple voting
presupposes the classifiers have equal performance

• Method 2: train a classifier over the outputs of the base
classifiers (meta-classification)
train using nested cross validation to reduce bias (usually Logistic Regression)

Question 11

Q

Pros of stacking

Answer

A

Mathematically simple but computationally expensive
method
• Able to combine heterogeneous classifiers with varying
performance
• Generally, stacking results in as good or better results than
the best of the base classifiers
• Widely seen in applied research; less interest within
theoretical circles (esp. statistical learning)

Question 12

Q

bagging/bootstrap aggregating

Answer

A

Basic intuition: the more data, the better the performance
(lower the variance), so how can we get ever more data out
of a fixed training dataset?

Construct “novel” datasets through a combination
of random sampling and replacement
• Randomly sample N’ the original dataset N times, with replacement (same instance can be selected over and over again)
• Thus, we get a new dataset of the same size, where any individual instance is absent with probability (1 −1/N)^N
• construct k random datasets for k base classifiers, and
arrive at prediction via voting

Question 13

Q

bagging/bootstrap aggregating

Answer

A

Basic intuition: the more data, the better the performance
(lower the variance), so how can we get ever more data out
of a fixed training dataset?

Construct “novel” datasets through a combination
of random sampling and replacement
• Randomly sample N’ the original dataset N times, with replacement (same instance can be selected over and over again)
• Thus, we get a new dataset of the same size, where any individual instance is absent with probability (1 −1/N)^N
• construct k random datasets for k base classifiers, and
arrive at prediction via voting

• The same classification algorithm is used throughout
• As bagging is aimed towards minimising variance through
sampling, the algorithm should be unstable ( =
high-variance)
• high variance: DT (if a few instances are excluded, the whole model might be different)
• low variance: SVM (hard margin; soft margin wouldn’t help much), LR (the overall result wouldn’t change much anyway)

Question 14

Q

Pros of Bagging

Answer

A

Pros:
• Simple method based on sampling and voting
• Possibility to parallelise computation of individual base classifiers
• Highly effective over noisy datasets (outliers may vanish)
• Performance is generally significantly better than the base classifiers (esp. DT) and only occasionally substantially worse

Question 15

Q

Random Tree

Answer

A

A “Random Tree” is a Decision Tree where:
• At each node, only some of the possible attributes that are randomly selected are
considered
• Attempts to control for unhelpful attributes in the feature set (DT does not do that)
• Much faster to build than a “deterministic” Decision Tree,
but increases model variance (which is our goal because high variance is good for bagging)

Question 16

Q

Random Forests

Answer

Study These Flashcards

A

An ensemble of Random Trees (many trees = forest)
• Each tree is built using a different Bagged training dataset
• As with Bagging the combined classification is via voting
• The idea behind them is to minimise overall model variance,
without introducing (combined) model bias

Question 17

Q

Pros & Cons of RF

Answer

Study These Flashcards

A

Pros:
• Generally a very strong performer
• parallelisable & efficient
• Robust to overfitting

Cons:
• Interpretability sacrificed

Question 18

Q

Boosting

Answer

Study These Flashcards

A

Basic intuition: tune base classifiers to focus on the “hard to classify” instances

Iteratively change the distribution and weights of
training instances to reflect the performance of the classifier on the previous iteration
• start with each training instance having a 1/N
probability of being included in the sample
• over T iterations, train a classifier and update the weight of each instance according to whether it is correctly classified
• combine the base classifiers via weighted voting

Question 19

Q

AdaBoost

Answer

Study These Flashcards

A

alpha = importance of Ci = the weight associated with the classifier vote

if the error rate is low, alpha is more positive; if the error rate is high, alpha is more negative.

Base classification algorithm: decision stumps (1-R) or decision trees
reinitialise the instance weights whenever i > 0.5

Question 20

Q

Pros & Cons of Boosting

Answer

Study These Flashcards

A

• Mathematically complicated but computationally cheap
method based on iterative sampling and weighted voting
• More computationally expensive than bagging
• The method has guaranteed performance in the form of
error bounds over the training data
• Interesting effect with convergence of the error rate over the
training vs. test data
• In practical applications, boosting has the tendency to
overfit

Question 21

Q

Bagging/RF vs. Boosting

Answer

Study These Flashcards

A

Bagging/RF
• Parallel sampling 
• Simple voting 
• Single classification algorithm 
• Minimise variance 
• Not prone to overfitting

Boosting
• Iterative sampling 
• Weighted voting 
• Single classification algorithm 
• Minimise (instance) bias
• Prone to overfitting

Esemble Classifier Flashcards

(21 cards)