Lecture 7 Flashcards

1
Q

What kind of distribution do real valued EDAs require?

A

practically useful, such as normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is ML? (in EDA)

A

Maximum Likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the limitation of ML?

A

Can only model linear-like dependencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between ES and normal based EDA?

A

ES uses normal distributions for self adaption of models. The model updates implicitly through selection and random mutation

normal based EDA Explicitly couples population to model-update rules by performing estimation on the direction of improvements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When can the direct use of ML-normal in a EDA have a positive result

A
  • The Function is unimodal (one peak)
  • The function is centered at origin
  • Easy to converge towards minimum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the big downside of using direct ML-normal in EDA

A
  • Structure of solution is very complicated and hardly matches normal distribution
  • Improving directions are ignored in MLE
  • The EDA does not observe the direction of the population
  • Hence, no exploration ouside of the data range, so real optimum can easily be missed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What would we expect to observe over multiple generations of Direct ML normal on EDA?

A

The algorithm tries to find a distribution that best fits the observed data, rather than the best solution in solution space. (skewed initial data stays skewed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the premature convergence problem

A

In direct ML-normal on real valued EDA, the variance of the normal distribution estimation will convergence to 0 very fast before the search space has been explored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is Gradient Hybridization not a great solution for real values EDAs?

A

It requires gradient information, which is not always possible in complex problems, and not always reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the three ingredients for Adaptive ML estimation?

A
  • Adaptive Variance Scaling (AVS)
  • Standard Deviation Ratio (SDR)
  • Anticipated Mean Shift (AMS)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is AVS?

A

Adaptive Variance Scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is SDR?

A

Standard Deviation Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is AMS?

A

Anticipated Mean Shift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In SDR-AVS, what is the NIC counter?

A

When there are multiple local optima in your problem, it will take SDR-AVS too long to converge to one of them. It limits the adaption of variance in the estimation distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain what the distribution muliplier does for SDR-AVS.

A

It will enlarge the variance of which new samples are taken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two reasons that no improvement was found in the SDR-AVS?

A

Either the vairance is too large and ED took samples of worse solutions.
Or the distribution covers multiple local optima, in which randomly sampled solutions between these optima can worsen the average.

17
Q

What is the shortcomming of SDR-AVS?

A

It solves searching in 1D space, but not in 2D space.

18
Q

What is the idea of AMS?

A

Anticipate where the mean will be shifiting, then alter part of generated solutions using that shfit.
Predictions on slope will be better. But we require balanced selection to re-align covariance matrix.

19
Q

How is the shift in AMS calculated?

A

𝝁𝑆ℎ𝑖𝑓𝑡(𝑡) =𝝁(𝑡) −𝝁(𝑡 − 1),
(𝝁 should be MEAN 𝝁)
such that 𝒙 ← 𝒙 + 𝛿 * 𝝁𝑆ℎ𝑖𝑓𝑡(𝑡).

Where t is the generation number and factor 𝛿 the impact of the shift.

20
Q

What behaviour does AMS have on a peak?

A

No change, as
𝝁(𝑡) ≈ 𝝁(𝑡 − 1)

21
Q

Why is AMS still not a optimal approach for real-valued EDA?

A

Because choosing the right parameters is really finicky and still requires a lot of attention.

22
Q

What does AMaLGaM spell?

A

Adapted Maximum-Likelihood Gaussian Model

23
Q

What is the idea of AMaLGaM?

A

It uses the variance approach from SDR-AVS to widen the search space. And it uses the mean-shift from AMS in order to faster move solutions down slopes.