Capacity, Overfitting And Underfitting Flashcards

Question 1

Q

The ability to perform well on previously unobserved inputs is called

Answer

A

Generalisation

Question 2

Q

Generalisation error

Answer

A

Aka test error

For linear regression (image)

Question 3

Q

Training error (linear regression)

Question 4

Q

Data generating process

Answer

A

Making IID assumptions of training and test data collectively

We assume we are drawing test and train data from sample distribution (data generating distribution demoted p_data )

Question 5

Q

caveat to data generating process in ML

Answer

A

We do not fix parameters ahead of time

We sample training set then use it to choose parameters to reduce training set error THEN sample test set

Therefore, E_{test error} >= E_{training error}

(Where without this process they would be equal as they’re from same dist)

Question 6

Q

Factors determining how well an ML algorithm performs

Answer

A

1) make training error small (large error is underfitting)

2) make gap between training error and test error small (large gap is overfitting)

Question 7

Q

How do we control overfitting/underfitting

Answer

A

Altering capacity

Question 8

Q

Capacity def

Answer

A

(Informally) a model’s capacity is its ability to fit a wide variety of functions

Low capacity models may struggle to fit training set

High capacity may over fit by ‘memorising’ training set

Question 9

Q

Hypothesis space

Answer

A

The set of functions that the learning algorithm is allowed to select as a solution

Altering this is one way of altering capacity

Question 10

Q

Increasing capacity of linear regression example

Answer

A

Adding polynomial terms of the inputs does not require polynomial terms in the output

Question 11

Q

Draw and explain the underfitting over fitting diagram

Answer

A

Self explanatory diagram

Question 12

Q

Def regularization

Answer

A

Any modification that we make to a learning algorithm to intent reduce its generalisation error but not its training error

Question 13

Q

K fold cross validation algorithm

Question 14

Q

Point estimator

Answer

A

Any function of the parameters

Question 15

Q

Frequentlist perspective on statistics

Question 16

Q

Weight decay

Answer

Study These Flashcards

A

Form of regularisation for linear regression

(Image is quantity to minimise)

Question 17

Q

Asymptotically unbiased

Answer

Study These Flashcards

A

Question 18

Q

Variance of estimator and standard error

Answer

Study These Flashcards

A

Question 19

Q

Navigating bias variance trade off

Answer

Study These Flashcards

A

(Can use cross validation)

Or

Question 20

Q

Show how bias and variance relate to over and underfitting in a diagram

Answer

Study These Flashcards

A

Capacity, Overfitting And Underfitting Flashcards

(20 cards)