Topic 6: Ensemble Theory Flashcards
Bias-Variance-Diversity decomposition
Expected risk (ensemble) = noise + bias + variance - diversity
What is Diversity
ESn [ 1/m mΣi=1 (fi(x) - f¯(x))^2 ]
Difference between models
diverse models make different errors on new data points
compares prediction made my ith model fi(x) and the average over all models
How does diversity help in ensembles
computational efficiency
robustness against adversarial attacks
improved performance in various applications
What is centroid * (circle above term)
Can refer to any of: arithmetic mean, harmonic mean, etc
Represents the centre of the model distribution
It is averaging over all possible data sets (infinite)
Generalised Bias-variance decomposition
ED [ EXY [ ℓ(Y,q)] ] = EX [ EY|X [ ℓ(Y,Y)] ] + ℓ(Y*,q◦) + ED[ ℓ(q◦, q)
Expected risk = noise + bias + variance
{z }
Generalised ambiguity decomposition
ℓ(y, q¯) = 1/m Σ ℓ(y, qi) - 1/m Σ ℓ(q¯, qi)
Ensemble loss = average loss - ambiguity
What is q¯
The ensemble combination
Eg for squared loss ->arithmetic mean
for KL -> normalised geometric mean
How does regularisation effect variance and bias
It moves models around in the variance bias axes
It can decrease variance and may increase bias
Eg Linear + reg = lower var, higher bias
Larger v smaller networks and diversity
Larger networks tend to perform better due to lower bias and variance, despite potentially lower diversity(capture more complexity and overfit) compared to smaller networks
How does diversity interact with bagging
Random Forests initially underperform Bagging but catch up as the ensemble size increases
Random Forests exhibit higher variance-effect but compensate with higher diversity-effect in larger ensembles
What is the centroid combiner for poisson regression loss
geometric mean
What is the centroid combiner for KL divergence
normalised geometric mean
What is the centroid combiner for itakuro saito loss
harmonic mean
what is the centroid combiner for squared loss
arithmetic mean
single model vs ensemble model tradeoffs
In single models we have a 2-way tradeoff (bias/variance)
In ensembles of models, it’s a a 3-way tradeoff (bias/variance/diversity)
But it only holds if we use the centroid combiner rule