Week 9 - Model ensembles Flashcards

1
Q

What are the three types of models based on trees

A
  1. Decision trees
  2. Random forest
  3. Gradient boosting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are model ensembles

A

The wisdom of crowds for machines. Combinations of models are known as model ensembles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

if p>1/2 this implies

A

weak learnability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What 2 things do ensemble methods have in common

A
  • They construct multiple, diverse predictive models from adapted versions of the training data.
  • They combine the predictions of these models in some way, often by simple averaging or voting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is bootstrapping

A

Bootstrapping is any test of metric that uses random sampling with replacement, and falls under the broader class of re-sampling methods. Boostrapping assigns measures of accuracy to sample estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the 3 step process creating a bootstrap sample

A
  1. Randomly sample one observation from the set
  2. Write it down
  3. Put it back in the set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bootstrapping contains duplicates for

A

Diversity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is bootstrap AGgregating (Bagging) ensembling method

A
  • create multiple random samples from the original data using bootstrapping
  • Creating different models (learners) from different random samples of the original data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is subspace sampling?

A

Encouraging the diversity in ensembles by building models from a different random subset of features instead of all features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is random forests

A

Random forests is an ensemble learning approach for various tasks (regression, classification) that create various decision trees at training time and output the class that is the mode of the classes (classification) or mean/average prediction (regression) of the individial trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is boosting

A

Boosting is an ensemble method for reducing the error in supervised learning by converting weak learners into strong ones. The goal is that each learner can do a bit better than the previous one by iteratively considering the error rate and try to improve by focusing on the points they did not perform very well on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we improvel learning with boosting?

A

By giving the misclassified instances a higher weight, and modifying the classifier to take these weights into account

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we assign weights (boosting)

A

We want to assign half of the weights to misclassified items and the other half to the correctly classified items
* Every item 1/|D|
* Weight of all misclassified items: e
* Weight of all correctly classified items: 1-e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is epsilon in boosting

A

the error rate. FP+FN/Total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 3 sources of misclassification erros

A
  1. Unavoidable bias
  2. Bias due to low expressiveness of models
  3. High variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is unavoidable bias

A

if instances from different classes are describd by the same feature vectors

16
Q

What is bias due to low expressiveness of models

A

if the data is not linearly seperable then even the best liear classifier will make mistakes.

17
Q

What is high variance

A

A model has high variance if its decision boundary is highly dependent on the training data.

18
Q

What is bagging predominantly for?

A

Bagging is predominantly a variance-reduction technique. It is often used in combination with high-variance models such as tree models.

19
Q

What is boosting predominantly for?

A

Boosting is primarily a bias reduction tenchnique. It is typically used with high-bias models such as linear classifiers or univariate decision trees.

20
Q

What is a meta-model

A

a model that best combinees the predictions of base models

21
Q

What is stacking

A

Stacking involves training a learning algorithm to combine the predictions of several other learning algorithms.
* several models are trained using the available data
* a learning algorithm is trained to make a final prediction using the predictions of the other algorithms.

22
Q

Name 2 types of experiments we would conduct on ml models

A
  1. on one specific dataset
  2. on a varied set of datasets
23
Q

what is cross-validation

A

Randomy partition the data in k folds, set one fold aside for testing, train a model on the remaining k-1 folds and evaluate it on the test fold. This process is repeated k times until each fold has been used for testing once.

24
Q

What does cross-validation accomplish

A

By averaging over training sets we get a sense of the variance of the learning algorith. Once we are satisfied with the performance of our learning algorith, we can run it over the entire data set to obtain a single model.

25
Q

what cross validation do we use if we have very few training instances?

A

leave out out cross-validation

26
Q

what is leave one out cross validation

A

alternatevly we can we k=n and train on all but one test instance, repeated n times. This means that in each single-instance ‘fold’ our accruacy estimate ir 0 or 1, but by averaging n of those we get an approximetely normal distribution by the central limit theorem.

27
Q

What is null hypothesis

A

is the hypothesis that there is no significant difference between specified distributions

28
Q

what is significance test

A

A test of significance is a formal procedure for comparing observed distibutions of data with a hypothesis

29
Q

what is a p-value

A

the probability of obtaining a measurement of a certain vlaue or higher given the null hypothesis

30
Q

what is a t-test

A

a t-test of a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be relate in certain feaures.

31
Q

when conducting an experiment with 1 dataset

A

use paired t-test

32
Q

When comparing the perfroamance of a pair of algorithms over multiple dataset use

A

Wilcoxon’s signed-rank test

33
Q

What is Wilconxon’s signed-rank test

A
  • the idea is to tank the performance differences in absolute value, from smallest to largest
  • We then calculate the sum of ranks for positive and negative differences seperately and take the smaller of these sumbs as our test statistic
  • Null hypothesis: two algorithms perform equally on multiple data sets
34
Q

what is a critical value in Wilcoxon’s singed-rank test

A

The critical value (the value of the test statistic at which the p-value equals alpha) can be found in a statistical table and used for rejecting the null hypothesis.

35
Q

How to compare multiple algorithms over multiple data sets

A

Freidman test

36
Q

What is the friedman test

A
  • The idea is to rank the performance of all k algorithms per data set, from best performance to worst performance
  • R_ij -> the rank of the j-th algorithm on the i-th data set
37
Q

What 3 quantities do we need to calculate in the Friedman test

A
  1. The average rank
  2. The sum of squared differences ( the spread between the rank centroids)
  3. The sum of the squared differences ( the spread over all ranks)
38
Q

What is the friedman statistic?

A

The ration of the second to third quantity.

39
Q

What is a post-hoc test

A
  • The idea is to calculate the critical difference
  • The nemenyi test calculates the critical difference