Bias Complexity Trade-off Flashcards
No free lunch theorem, NFL
Let A be any learning algorithm for the task of binary classification wrt the 0-1 loss over a domain X. Let m be any number smaller than |X|/2, representing a training set size. Then, there exists a distribution D over X x {0,1} such that:
- there exists a function f : X –> {0,1} with Ld(f) = 0
- with probability of at least 1/7 over the choice of S \sim D^m we have that Ld(A(S)) >= 1/8
Informally: This theorem states that for every learner, there exists a task on which it fails, even though that task can be successfully learned by another learner.
Prior Knowledge
When approaching a particular learning problem, we should have some prior knowledge on D.
Types of prior knowledge:
- D comes from some specific parametric family of distribution
- Exists h in some predefined hypothesis class H, such that Ld(h) = 0
- A softer type of prior knowledge on D is assuming that min Ld(h) is small
Corollary 5.2
Let X be an infinite domain set and let H be the set of all functions from X to {0,1}. Then, H is not PAC learnable.
Approximation error, estimation error, complexity and error decomposition