Lecture 6 Flashcards by Sven Dukker

Compared to EDA, what is the problem with variation in simple GA?

Thier are blind to the problem structure. Which leads to inefficiency of the variation.

How well did you know this?

Not at all

Perfectly

What does it mean that a function is additively decomposable?

When the problem can be decomposed into smaller problems, of which the sum of their solutions equals the solution of the larger problem.
f.i. functions whos fitness depends on blocks

How well did you know this?

Not at all

Perfectly

What are block identifiers? (and how are they represented)

The parameters in which blocks are defined.

k is block length
m is # blocks
l = k*m is string length

How well did you know this?

Not at all

Perfectly

Explain what thight encoding means.

It is when the genes in each block are neighbours:
0000 1111 2222 …

How well did you know this?

Not at all

Perfectly

Explain what loose encoding means.

It is when there is a single gene of each block present when you take a range equal of the block lenght:

012345 012345 …

How well did you know this?

Not at all

Perfectly

What does disruption mean?

It means that blocks get broken during variation.

How well did you know this?

Not at all

Perfectly

When is a function deceptive?

(from slides)
Any schema of size smaller than the block size of the
deceptive trap function has a better average fitness
if it has more zeros.

Following the schema theorem, those “0-schemata”
have better chance at receiving an increasing
number of matches over time.

This is only false for full-order schemata.

How well did you know this?

Not at all

Perfectly

In Empirical scalability analysis, what is the definition of succes?

When the optimal solution is found in a certain percentage of runs.

How well did you know this?

Not at all

Perfectly

What do we try to solve using Empirical scalibilty analysis?

Given a value for ℓ, what is the minimally required population size for wich the EA solves the problem?

How well did you know this?

Not at all

Perfectly

Decribe how empirical scalibiltity analysis works and what criterea needs to be met.

Try to increase population size (exponentially) until success has been reached. This will be the upper bound. Then tak the lower bound as half of your upper bound and perform additionaly search within this range in order to find the best population size.

This process should be repeated a couple times to verify results

How well did you know this?

Not at all

Perfectly

What is the principle of Model based EAs?

Appart from being an EA, it tires to learn and exploit structures in the problem.

How well did you know this?

Not at all

Perfectly

Give a rough psudo code of how model based EAs operate

The look at the last generation
They learn a stochastic model based on that
They generate new solutions that allign with that model
Perform survivor selection.
repeat

How well did you know this?

Not at all

Perfectly

What does EDA stand for?

Estimation-of-Distribution Algorithms

How well did you know this?

Not at all

Perfectly

Why is EDA a model based EA?

Because it uses the previous generation to estimate the distribution of the next generation.

How well did you know this?

Not at all

Perfectly

What major benefit does EDA have?

The remaining search space shrinks very fast, so very few generations needed to converge.

How well did you know this?

Not at all

Perfectly

Explain why we would want to used factorized EDAs

The hypothesis is that many problems can be decomposed in sets of vairables that togather have an above average contribution to a solutions fitness.

When we’re learning the distribution estimation in EDAs it can be much more efficient when we apply factorization on the distribution to capture this

How well did you know this?

Not at all

Perfectly

What is univariate factorization in EDA?
(aka univariate EDA)

the distribution over each variable is modelled to be
independent from that over every other variable

How well did you know this?

Not at all

Perfectly

Explain how UMDA operates.

UMDA is a type of EDA:
It evaluates a larger subselection of the generation and tries to estimate the chances of each binary gene to be present in the optimal solution based on the fitness of the genotypes. Then the next generation is sampled from those distributions.

How well did you know this?

Not at all

Perfectly

Does UMDA concidered decomposable?

Yes, its distribution uses univariate factorization as each variable is independently evaluated.

How well did you know this?

Not at all

Perfectly

Are univariate EDAs the solution for deceptive traps?

No, the univariate EDA will favor the solutions with above average fitness and converge very fast towards the trap.
Unless the population size is very large and it can therefor generate many optimal solutions at initialisation.

How well did you know this?

Not at all

Perfectly

What does PFDA stand for?

Study These Flashcards

Perfect Factorization Distribution Algorithm

Explain how PFDA works.

Study These Flashcards

Rather than considering each gene as independent, it will consider computing joint probabilities of
variables pertaining to entire subfunctions. Hence it will sample whole blocks instead of single genes.

Explain how choosing the success criterea relates to the efficiency in PFDA?

Study These Flashcards

After 100𝜏% best selection, probability of sampling the optimal solution in a deceptive problem increases to from 1/(2^k) (UMDA) to 1/(𝜏2^𝑘).

Does perfect factorization require thight or loose encoded problems?

Study These Flashcards

thight

Explain why UMDA is “dependency-blind”

It conciders each gene as independent when estimating distributions. It does not look at dependecies between those genes.

For what kind of problems is PFDA dependency perfect?

additively decomposable problems

Why do we use factorization of probability distributions?

To minimize the parameters required to store the distribution

Why do we use factorization of probability distributions?

To minimize the parameters required to store the distribution

Explain what multivariate factorization is.

It is a class of factorization: - MPM is a product of probability distributions over mutually exclusive sets of random variables - Variables are thus partitioned into (correlated) groups - Groups themselves are independent

Which algorithm can be concidered a class of multivariate factorization?

Both UMDA and PFDA.

For PFDA, what is the length of node partition vector ν?

|ν\_perfect| = #BBs

For UMDA, what is the length of node partition vector ν?

|ν\_univariate| = l = genotype length

Explain Bayesian factorization.

Each factor is a conditional distribution P\_{θ^i} (X\_i |X\_π\_i ) Probability of variable i, given its dependency πi

What requirements needs to be met to applay Bayesian?

Graph must be acyclic. and therefor must allow topological sort.

In model selection, what is likelhood?

How likely the data is, given a probabilistic model.

In model selection, what is entropy

The measure of uncertainty. (**H**)

Why would we prefer a greedy model selection?

Greedy models only the most important dependencieswhile maintining a limited complexity (mutually exclusive groups)

How is the greedy model selection (often) initialised?

By starting with the univariate factorization, which has no dependencies so it can arguably be seen as the worst model.

How does greedy model selection work?

Given initial model M0, it computes Candidate models (**C**). Then it chooses the best model out of M0 and **C**. When that model is still M0 it means the selection has converged and M0 is returned. Otherwise keep compute candidates based on this new "best" model untill the ultimate model is found.

How does greedy _multivariate_ factorization work?

It initialises from univariate factorization. It will then try to increase the complexity by joining factors: (1,2) (3) (4) ... (1,3) (2) (4) ... (1) (2,3) (4) ... It will select the best scoring factorization

How many possible "joins" does Greedy multivariate factorization selection consider at **first**

0.5 \* l \* (l-1)

How many possible "joins" does Greedy multivariate factorization selection consider at **second**

0.5 \* (l-1) \* (l-2) | Already joined groups will reduce complexity

When is a Greedy multivariate factorization selection process finished?

When the best selection model out of all possible group joins (in that iteration) equals the best selection model of the previous iteration. aka no improvement can be reached by further joining groups.

What is ECGA?

Extended Compact Genetic Algorithm

How does ECGA work?

1. it initialises a random population 2. It will use survival selection to select parents 3. it will use Greedy multivar. fac. selection to compute best probability distribution model. 4. Offspring is generated using this probability distribution. 5. Offspring replaces Population 6. The best solution of a run is being tracked and used as termination condition

How do UMDA en ECGA compare on OneMax and Deceptive trap? (when plotting avg fitness vs population)

UMDA slightly outperformes ECGA due to the independent nature of OneMax. UMDA even outperforms ECGA on deceptive trap in earlier stages due to the complexity of ECGA. But ECGA more likely to eventually find optimal solution faster in larger population sizes. | Outperforming means higher average fitness for same population size.

What is factorization in EDA?

**Distribution factorization**: Splitting up the variables intro groups and looking at the distribution them as a group instead of as individuals.

What is MPM

marginal product model: MPM is a product of probability distributions over mutually exclusive sets of random variables. Groups themselves are independent, but can have correlation in its variables

What are the three main model selection metrics?

MDL, AIC and BIC

Which one of the model selection metrics considers population size when evaluating complexity?

BIC

Lecture 6 Flashcards

(50 cards)