Session 4.2 Flashcards

1
Q

Higher number of folds means…

A

having to run more models, having larger train sets and smaller test sets

10 folds are the most common option. 5 folds are also frequently used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Having larger training sets leads to…

A

better performance in each model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Having smaller test sets leads to

A

higher variance across models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The Cumulative Response curve

A

plots the true positive rate as a function of the percentage of test instances targeted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Underfitting

A

A model that is too simple does not fit the data well (high bias)

e.g., fitting a quadractic function with a linear model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Overfitting

A

A model that is too complex fits the data too well (high variance)

e.g., fitting a quadractic function with a 3rd degree function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bias

A

a model that underfits is wrong on average (high bias) but is not highly affected by slightly different training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variance

A

a model that overfits is right on average, but is highly sensitive to specific training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ensemble methods

A

use multiple algorithms to obtain better predictive performance than could be obtained from any of the algorithms by itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Using multiple algorithms usually increases model performance by:

A

reducing variance: models are less dependent on the specific training data

Examples:

  • Bagging (or bootstrap aggregation)
  • Random Forest
  • Boosting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly