Overfitting Flashcards

1
Q

Bias-variance decomposition of the out-of -sample error

A

ED[ Eout(h) ] = bias + variance

bias = Ex[ (h_(x) - f(x))^2 ]
variance = Ex[ ED[ (hD(x) - h_(x))^2 ] ]

1) very small hypothesis set H:
- > high bias, low variance

2) very flexible model (high complexity):
- > low bias, high variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bias - variance diagram as a function of model complexity

A
  • increasing model complexity, the in-sample-error decreases and eventually reaches zero
  • out-of sample error, instead, has a minimum point, which corresponds to the best model order
  • before the minimum - > under fitting
  • after the minimum - > over fitting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Learning curves as a function of the dimension of the data-set

A

1) simple model h
- high bias of ED[Eout(h)]
- low variance of ED[Eout(h)]

2) complex model h
- low bias of ED[Eout(h)]
- high variance of ED[Eout(h)]

In both cases

  • ED[Eout] decreases with number of data
  • ED[Ein] increases with number of data
  • > model complexity should be selected based on the dimension of the data-set, not on the target complexity!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Possible tools to tackle the overfitting issue

A

1) regularization

2) (cross) validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Regularization: key idea

A
  • many different tools, for example L2-regularization
  • key idea is to set a constraint on the minimization of Ein problem, to avoid the use of too many parameters
  • > the new optimization problem becomes:

w^reg = argmin(w) Ein(w)
s.t. w’ w < = C

where C is a budget

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Formula for the minimum of the regularized linear regression problem

A

w^reg = (Z’ Z + λI )^-1 Z’ Y

where λ is a design parameter, that can be tuned by minimizing the cross-validation error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Validation: working principle

A

1) Partition the data set D into:
- training set Dtrain of size N-K
- validation set Dval of size K
- the partition must not depend on the data!

2) run the Learning algorithm to obtain g^- that minimizes Ein using the training set

3) compute the validation error Eval(g^-) corresponding to the obtained hypothesis using the validation set
the validation error is an unbiased estimate of Eout(g^-)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

On the choice of the dimension of the validation set K

A

This choice is a design choice subject to a tradeoff:

  • High K gives low variance of the validation error ( O(1/sqrt(K) )
  • Low K gives higher dimension of the training set, that gives similar generalization wrt the original data-set
  • > cross validation is a good way to solve the problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Leave-one-out cross validation

A
  • K = 1
  • there are N ways to partition the data set leaving one data as validation set
  • in this way, a good estimate of the out-of-sample error is obtained if N is large as
    Ecv = 1/N * sum(n=1,N) Eval(g^-n)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to use cross-validation for complexity selection

A
  • compute the cross-validation error Ecv for different models
  • the best model is the one with the lowest cross-validation error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly