Lecture 7 Flashcards
What kind of distribution do real valued EDAs require?
practically useful, such as normal distribution
What is ML? (in EDA)
Maximum Likelihood
What is the limitation of ML?
Can only model linear-like dependencies.
What is the difference between ES and normal based EDA?
ES uses normal distributions for self adaption of models. The model updates implicitly through selection and random mutation
normal based EDA Explicitly couples population to model-update rules by performing estimation on the direction of improvements.
When can the direct use of ML-normal in a EDA have a positive result
- The Function is unimodal (one peak)
- The function is centered at origin
- Easy to converge towards minimum
What is the big downside of using direct ML-normal in EDA
- Structure of solution is very complicated and hardly matches normal distribution
- Improving directions are ignored in MLE
- The EDA does not observe the direction of the population
- Hence, no exploration ouside of the data range, so real optimum can easily be missed.
What would we expect to observe over multiple generations of Direct ML normal on EDA?
The algorithm tries to find a distribution that best fits the observed data, rather than the best solution in solution space. (skewed initial data stays skewed)
Explain the premature convergence problem
In direct ML-normal on real valued EDA, the variance of the normal distribution estimation will convergence to 0 very fast before the search space has been explored.
Why is Gradient Hybridization not a great solution for real values EDAs?
It requires gradient information, which is not always possible in complex problems, and not always reliable.
What are the three ingredients for Adaptive ML estimation?
- Adaptive Variance Scaling (AVS)
- Standard Deviation Ratio (SDR)
- Anticipated Mean Shift (AMS)
What is AVS?
Adaptive Variance Scaling
What is SDR?
Standard Deviation Ratio
What is AMS?
Anticipated Mean Shift
In SDR-AVS, what is the NIC counter?
When there are multiple local optima in your problem, it will take SDR-AVS too long to converge to one of them. It limits the adaption of variance in the estimation distribution.
Explain what the distribution muliplier does for SDR-AVS.
It will enlarge the variance of which new samples are taken
What are the two reasons that no improvement was found in the SDR-AVS?
Either the vairance is too large and ED took samples of worse solutions.
Or the distribution covers multiple local optima, in which randomly sampled solutions between these optima can worsen the average.
What is the shortcomming of SDR-AVS?
It solves searching in 1D space, but not in 2D space.
What is the idea of AMS?
Anticipate where the mean will be shifiting, then alter part of generated solutions using that shfit.
Predictions on slope will be better. But we require balanced selection to re-align covariance matrix.
How is the shift in AMS calculated?
𝝁𝑆ℎ𝑖𝑓𝑡(𝑡) =𝝁(𝑡) −𝝁(𝑡 − 1),
(𝝁 should be MEAN 𝝁)
such that 𝒙 ← 𝒙 + 𝛿 * 𝝁𝑆ℎ𝑖𝑓𝑡(𝑡).
Where t is the generation number and factor 𝛿 the impact of the shift.
What behaviour does AMS have on a peak?
No change, as
𝝁(𝑡) ≈ 𝝁(𝑡 − 1)
Why is AMS still not a optimal approach for real-valued EDA?
Because choosing the right parameters is really finicky and still requires a lot of attention.
What does AMaLGaM spell?
Adapted Maximum-Likelihood Gaussian Model
What is the idea of AMaLGaM?
It uses the variance approach from SDR-AVS to widen the search space. And it uses the mean-shift from AMS in order to faster move solutions down slopes.