Topic 3: Generalisation Bounds Flashcards
What do generalisation bounds enable
Enable us to state with confidence that a model performs as we say it will - that it will maintain a certain level of accuracy
- very useful in safety critical or sensitive areas
Eg governmental/public services
How do we estimate how bad the testing error might get
R^(f,Sn) ~ R(f)
The law of large numbers states an estimate (empirical risk) of some number will approach the true value (population risk), as the sample size increases (n tends to infinity)
use hoeffdings inequality
What is Hoeffding’s LHS inequality in algebraic terms
P(|E[X] - 1/nΣxi| ≥ ε ) ≤ δ
What is Hoeffding’s inequality in words
Given a true value E[Z], and an empirical estimate 1/n ∑ zi, the probability of the estimate
deviating by more than ϵ from the true value is less than or equal to δ.
What is δ
The failure threshold (or failure probability)
What is ε
Some tolerance value
We want E[Z] - 1/n∑zi to be within this threshold
What is the relationship between n and δ
As n increases, δ decreases to almost 0
(note - use a fixed ε)
How do we rearrange hoeffdings equation for ε
ε = sqrt(ln(2/δ) / 2n )
What does δ = 0.01 in english terms
We want to be 99% confident
What is the relationship between n and ε
As n decreases, ε will decrease to small value
(We can fix a δ eg δ=0.01)
eg the line will asimptote at ε = 0.03 which means we can be 99% confident that the sample risk will be within 0.03 of the true risk
How can we solve for n
Alternatively, we can fix δ eg 0.05 (95% confidence) and as we can increase ε on the x axis we can see how many samples (n) we need to be within some ε of the true risk
What do we call F
Function class
What is the formula for Union Bound (Boole’s inequality)
P ((M⋃i=1) Vi) ≤ (M∑i=1)P (Vi)
Essentially the probability of one of at least one of the values v1 t vm being true
is less than or equal to
the total sum of all individual probabilities (p(v1) + p(v1)…+p(vm))
Uniform Deviation Bound (Hoeffdings equation in terms of function class F)
P(∃f ∈|R(f) - R^(f,Sn)| ≥ ε ) ≤ δ = 2|F|exp(-2nε^2)
What is the uniform deviation bound in English terms
The probability that there exists a model which violates our tolerance zone is less than or equal to δ
We can also say with 1-δ probability that all of our estimates are within the zone
i.e. ∀f, we have |R(f) - R^(f,Sn)| < ε