Lecture 6 Flashcards

1
Q

Compared to EDA, what is the problem with variation in simple GA?

A

Thier are blind to the problem structure. Which leads to inefficiency of the variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does it mean that a function is additively decomposable?

A

When the problem can be decomposed into smaller problems, of which the sum of their solutions equals the solution of the larger problem.
f.i. functions whos fitness depends on blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are block identifiers? (and how are they represented)

A

The parameters in which blocks are defined.

k is block length
m is # blocks
l = k*m is string length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain what thight encoding means.

A

It is when the genes in each block are neighbours:
0000 1111 2222 …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain what loose encoding means.

A

It is when there is a single gene of each block present when you take a range equal of the block lenght:

012345 012345 …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does disruption mean?

A

It means that blocks get broken during variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is a function deceptive?

A

(from slides)
Any schema of size smaller than the block size of the
deceptive trap function has a better average fitness
if it has more zeros.

Following the schema theorem, those “0-schemata”
have better chance at receiving an increasing
number of matches over time.

This is only false for full-order schemata.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In Empirical scalability analysis, what is the definition of succes?

A

When the optimal solution is found in a certain percentage of runs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do we try to solve using Empirical scalibilty analysis?

A

Given a value for ℓ, what is the minimally required population size for wich the EA solves the problem?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Decribe how empirical scalibiltity analysis works and what criterea needs to be met.

A

Try to increase population size (exponentially) until success has been reached. This will be the upper bound. Then tak the lower bound as half of your upper bound and perform additionaly search within this range in order to find the best population size.

This process should be repeated a couple times to verify results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the principle of Model based EAs?

A

Appart from being an EA, it tires to learn and exploit structures in the problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Give a rough psudo code of how model based EAs operate

A
  1. The look at the last generation
  2. They learn a stochastic model based on that
  3. They generate new solutions that allign with that model
  4. Perform survivor selection.
  5. repeat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does EDA stand for?

A

Estimation-of-Distribution Algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is EDA a model based EA?

A

Because it uses the previous generation to estimate the distribution of the next generation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What major benefit does EDA have?

A

The remaining search space shrinks very fast, so very few generations needed to converge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain why we would want to used factorized EDAs

A

The hypothesis is that many problems can be decomposed in sets of vairables that togather have an above average contribution to a solutions fitness.

When we’re learning the distribution estimation in EDAs it can be much more efficient when we apply factorization on the distribution to capture this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is univariate factorization in EDA?
(aka univariate EDA)

A

the distribution over each variable is modelled to be
independent from that over every other variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain how UMDA operates.

A

UMDA is a type of EDA:
It evaluates a larger subselection of the generation and tries to estimate the chances of each binary gene to be present in the optimal solution based on the fitness of the genotypes. Then the next generation is sampled from those distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Does UMDA concidered decomposable?

A

Yes, its distribution uses univariate factorization as each variable is independently evaluated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Are univariate EDAs the solution for deceptive traps?

A

No, the univariate EDA will favor the solutions with above average fitness and converge very fast towards the trap.
Unless the population size is very large and it can therefor generate many optimal solutions at initialisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does PFDA stand for?

A

Perfect Factorization Distribution Algorithm

22
Q

Explain how PFDA works.

A

Rather than considering each gene as independent, it will consider computing joint probabilities of
variables pertaining to entire subfunctions. Hence it will sample whole blocks instead of single genes.

23
Q

Explain how choosing the success criterea relates to the efficiency in PFDA?

A

After 100𝜏% best selection, probability of sampling the optimal solution in a deceptive problem increases to from 1/(2^k) (UMDA) to 1/(𝜏2^𝑘).

24
Q

Does perfect factorization require thight or loose encoded problems?

A

thight

25
Q

Explain why UMDA is “dependency-blind”

A

It conciders each gene as independent when estimating distributions. It does not look at dependecies between those genes.

26
Q

For what kind of problems is PFDA dependency perfect?

A

additively decomposable problems

27
Q

Why do we use factorization of probability distributions?

A

To minimize the parameters required to store the distribution

28
Q

Why do we use factorization of probability distributions?

A

To minimize the parameters required to store the distribution

29
Q

Explain what multivariate factorization is.

A

It is a class of factorization:

  • MPM is a product of probability distributions over mutually
    exclusive sets of random variables
  • Variables are thus partitioned into (correlated) groups
  • Groups themselves are independent
30
Q

Which algorithm can be concidered a class of multivariate factorization?

A

Both UMDA and PFDA.

31
Q

For PFDA, what is the length of node partition vector ν?

A

|ν_perfect| = #BBs

32
Q

For UMDA, what is the length of node partition vector ν?

A

|ν_univariate| = l = genotype length

33
Q

Explain Bayesian factorization.

A

Each factor is a conditional distribution P_{θ^i} (X_i |X_π_i )

Probability of variable i, given its dependency πi

34
Q

What requirements needs to be met to applay Bayesian?

A

Graph must be acyclic.
and therefor must allow topological sort.

35
Q

In model selection, what is likelhood?

A

How likely the data is, given a probabilistic model.

36
Q

In model selection, what is entropy

A

The measure of uncertainty. (H)

37
Q

Why would we prefer a greedy model selection?

A

Greedy models only the most important dependencieswhile maintining a limited complexity (mutually exclusive groups)

38
Q

How is the greedy model selection (often) initialised?

A

By starting with the univariate factorization, which has no dependencies so it can arguably be seen as the worst model.

39
Q

How does greedy model selection work?

A

Given initial model M0, it computes Candidate models (C). Then it chooses the best model out of M0 and C. When that model is still M0 it means the selection has converged and M0 is returned.
Otherwise keep compute candidates based on this new “best” model untill the ultimate model is found.

40
Q

How does greedy multivariate factorization work?

A

It initialises from univariate factorization. It will then try to increase the complexity by joining factors:
(1,2) (3) (4) …
(1,3) (2) (4) …
(1) (2,3) (4) …

It will select the best scoring factorization

41
Q

How many possible “joins” does Greedy multivariate factorization selection consider at first

A

0.5 * l * (l-1)

42
Q

How many possible “joins” does Greedy multivariate factorization selection consider at second

A

0.5 * (l-1) * (l-2)

Already joined groups will reduce complexity

43
Q

When is a Greedy multivariate factorization selection process finished?

A

When the best selection model out of all possible group joins (in that iteration) equals the best selection model of the previous iteration.
aka no improvement can be reached by further joining groups.

44
Q

What is ECGA?

A

Extended Compact Genetic Algorithm

45
Q

How does ECGA work?

A
  1. it initialises a random population
  2. It will use survival selection to select parents
  3. it will use Greedy multivar. fac. selection to compute best probability distribution model.
  4. Offspring is generated using this probability distribution.
  5. Offspring replaces Population
  6. The best solution of a run is being tracked and used as termination condition
46
Q

How do UMDA en ECGA compare on OneMax and Deceptive trap? (when plotting avg fitness vs population)

A

UMDA slightly outperformes ECGA due to the independent nature of OneMax.

UMDA even outperforms ECGA on deceptive trap in earlier stages due to the complexity of ECGA. But ECGA more likely to eventually find optimal solution faster in larger population sizes.

Outperforming means higher average fitness for same population size.

47
Q

What is factorization in EDA?

A

Distribution factorization: Splitting up the variables intro groups and looking at the distribution them as a group instead of as individuals.

48
Q

What is MPM

A

marginal product model: MPM is a product of probability distributions over mutually
exclusive sets of random variables. Groups themselves are independent, but can have correlation in its variables

49
Q

What are the three main model selection metrics?

A

MDL, AIC and BIC

50
Q

Which one of the model selection metrics considers population size when evaluating complexity?

A

BIC