Session 4.2 Flashcards

Question 1

Q

Higher number of folds means…

Answer

A

having to run more models, having larger train sets and smaller test sets

10 folds are the most common option. 5 folds are also frequently used.

Question 2

Q

Having larger training sets leads to…

Answer

A

better performance in each model

Question 3

Q

Having smaller test sets leads to

Answer

A

higher variance across models

Question 4

Q

The Cumulative Response curve

Answer

A

plots the true positive rate as a function of the percentage of test instances targeted

Question 5

Q

Underfitting

Answer

A

A model that is too simple does not fit the data well (high bias)

e.g., fitting a quadractic function with a linear model

Question 6

Q

Overfitting

Answer

A

A model that is too complex fits the data too well (high variance)

e.g., fitting a quadractic function with a 3rd degree function

Question 7

Q

Bias

Answer

A

a model that underfits is wrong on average (high bias) but is not highly affected by slightly different training data

Question 8

Q

Variance

Answer

A

a model that overfits is right on average, but is highly sensitive to specific training data

Question 9

Q

Ensemble methods

Answer

A

use multiple algorithms to obtain better predictive performance than could be obtained from any of the algorithms by itself

Question 10

Q

Using multiple algorithms usually increases model performance by:

Answer

A

reducing variance: models are less dependent on the specific training data

Examples:

(10 cards)