Lesson 4 - Aproximation and bounds on probability Flashcards

Question 1

Q

What does PAC mean?

Answer

A

Probably Approximately Correct
We assume σ sample probability and π real probability
- in a large sample (large N), the value σ is likely close to π
- P(|σ−π|>ε)≤2e^(−2Nε^2 ) [Hoeffding’s inequality]
- σ= π is PAC
- an aproximation correct (close to true) and probable

Question 2

Q

PAC learning

Answer

A

the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions
- low generalization error
Hoeffding’s Inequality with in-sample and out-of-sample errors

Question 3

Q

Union bound

Answer

A

for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events
upper bound on the probability of occurrence of at least one of a countable number of events in terms of the individual probabilities of the events

Question 4

Q

What is Shattering, formulation and what is it used for?

Answer

A

S is a set of samples and is shattered by the hypothesis space H if for every possible dichotomy an hypothesis h exists that can implement it
Given S⊂X, S is shattered by the hypothesis space H iff ∀S^′⊆S, ∃h∈H, such that ∀x∈S, h(x)=1⇔x∈S′ (H is able to implement all possible dichotomies of S)
used to measure the complexity of a hypothesis space

Question 5

Q

What is the VC-dimension?

Answer

A

Vapnik-Chervonenkis
measure of the capacity of an hypothesis space/functions
The VC-dimension of a hypothesis space H defined over an instance space X is the size of the largest finite subset of X shattered by H: VC(H)= max( S⊆X) |S|:S is shattered by H
If arbitrarily large finite sets of X can be shattered by H, then VC(H)=∞

Question 6

Q

Generalization Error

Answer

A

upper bound on ideal error, is valid with probability (1 − δ)
- δ arbitrarily small
error(g) ≤ errorS (g) + F(VC(H)/n, δ)
- depends on the hypothesis (g)
- VC-confidence
  - training size (n) - inv
  - VC dimension - prop
  - confidence δ - inv
Problem: VC-dimension grows -> empirical risk (errorS (g)) decreases -> VC confidence increases

Question 7

Q

Structural Risk Minimization

Answer

A

(7 cards)