[3] Evolutionary Computation Flashcards
What are GAs?
Genetic algorithms closely mini biology:
- each individual only has one parent
- chromosomes are the data structure; they are fixed length strings
What are the basic processes in GAs?
Selection
Mutation
Crossover
How does selection work?
Only the fittest individuals are used to form the next population
Roulette wheel selection selects individuals in proportion to their fitness
Tournament selection picks the best individual from random samples of the population
How does crossover work?
Single-point and two-point crossover breaks the chromosomes at one or two points respectively
Uniform crossover uses a bit mask to ensure each gene is equally likely to be swapped
What options exist for termination criteria for GAs?
After a fixed number of iterations, if the best solution hasn’t changed for a while, if the average fitness hasn’t changed for a while, or when convergence has been reached (all individuals have the same genotype)
What are steady-state GAs?
They don’t use generations; good individuals are selected to breed and produce offspring which replace bad ones
What is elitism?
Ensuring that the best individuals are always copied to the next generation, so the best solutions aren’t lost
How can illegal solutions to GAs be avoid?
- Restrict crossover and mutation so that it only gives valid children
- Heavily penalise invalid solutions
- Repair broken individuals
What is the basic procedure for a GA?
[1] Initialise the population
[2] Evaluate the fitness of each individual in the current population
[3] Until the next population is created:
- select individuals from the current population
- perform genetic operations on the individuals
- insert the new operations into the next population
[4] Check if the termination criteria has been met. If not, repeat [2-4]
[5] Output the best individual
What is differential evolution?
Mutation is used to create a trail vector, which is used with crossover to create one offspring
The step size is influences by differences between individuals in the population
What is deterministic evolution?
The parent is only replaced by their offspring if they have higher fitness
What is Michaelwicz’s interpretation?
As GAs are tailored to particular problems, they do worse for arbitrary problems
What are memetic algorithms?
They either:
- include local search in the EC loop
- Use domain-specific knowledge in the genetic operators, such as avoiding the creation of infeasible solutions
What representation do GPs use?
Tree
What are the basic structures of GP tree?
Functions form the root and internal nodes of the the tree. They perform operations on their inputs
Terminals have no arguments and form the leaves of the tree. They may be inputs, constants or random numbers
Collectively, functions and terminals are called primitives
How should the function and terminal sets be selected?
They should satisfy two criteria:
- Sufficiency - some collection of primitives should be able to solve the problem
- Closure - any function can accept input from any function or terminal
What is the initial population of a GP constructed?
Full method - functions are selected as the does until a given depth is reached. Then terminals are selected to form leaf nodes
Grow method - nodes are selected from either the function or terminal sets. If a terminal is selected, the generation process moves on to the next non-terminal branch of the tree
Ramped half-and-half method - half of the population is generated from each of the above
[THESE ARE ALSE USED FOR MUTATION]
What is the advantage of the full method for generating GPs?
Fully, entirely balanced trees are constructed
What are the genetic operators for GPs?
Reproduction - the program is copied as-is to next generation
Mutation - a random subtree is replaced using the program generation method
Cross-over - sub-trees are randomly swapped between individuals
What is the main use case of GPs?
Symbolic regression
What are the they key differences between GAs and GPs?
- GAs use strings to represent solutions while GPs use trees
- GA strings are fixed-length, while GP trees can vary in length
- GA uses a binary alphabet, while GPs use various alphabets
- GA solutions have to be encoded and decoded while GPs can be interpreted directly as code
What are repeating patterns in a GP tree called?
Building blocks
What are building blocks?
Repeating patterns in a GP
What special operator do many GPs use?
Protected division, which is zero if the denominator is 0
How do Strongly Typed GPs work?
They restrict cross-over and mutation to ensure they only produce valid offspring
Genetic types ensure functions can handle a range of datatypes
What are the key challenges of strongly typed GPs?
Ensuring the fitness function works for a multiple types, and ensuring there is a range of functions for each type
What is a key problem with GPs? How is it addressed?
Small changes can have a big affect on the output, and so offspring have low relatedness to their parents
Conversely, many changes have no impact.
This is addressed by semantics
How does semantics work?
Semantics are vectors of the outputs; the target output vector is the target semantics
Each semantic is a point in semantic space. Similar solutions will have a low distance in this space
Semantic awareness is incorporated using precise genetic operators i.e. only perform cross-over between semantically similar sub-programs
Geometric Semantic Search performs cross-over so that the offspring have an equal semantic difference from their parents; mutation ensure the offspring aren’t too semantically different. However, this is computationally expensive and causes the offspring to quickly grow in size
What are the principles of PSO?
- Each particle represents a candidate solution
- Particles fly in the search space to find the best solutions
- Each particle remembers its personal best, and knowns the global best
- Each particle has a position and a velocity
What are the components of velocity in PSO?
- Previous velocity (weighed with w)
- Cognitive component (weighted with c1)
- Social component (weighted with c2)
How does the previous velocity component work?
The particle remembers its previous velocity, preventing it from suddenly changing direction
How does the cognitive component work?
It attracts particles to their personal best, especially if there current fitness isn’t very good
How does the social component work?
It attracts particles to the swarm’s global best, especially if its current location isn’t very good
What is a common issue with PSO? How is it addressed?
The velocities can become really high, so they are commonly clamped
How does changing the inertia weight affect PSO?
For w >= 1, velocity increases overtime and the swarm diverges
For 0 < w < 1, the particles decelerate, and may converge based on the values of c1 and c2
What is the major tradeoff with PSO?
Exploration vs exploitation
What the coefficients for PSO called?
w is the inertia weight
c1 and c2 are collectively the acceleration coefficients
How do the acceleration coefficients affect swarm behaviour?
c1 > 0; c2 = 0: each search independently does local search
c1 = 0; c2 > 0: the swarm is a stochastic hill climber
c1 = c2 > 0: particles are attracted the average of their personal best and global best
Low c1 and c2 lead to smooth particle trajectories; high values lead to abrupt changes in speed and direction
How should the acceleration coefficient be chosen based on the problem being solved?
It is very domain specific, but in general:
- use c1 > c2 for multimodal problems
- use c1 < c2 for unimodal problems
What options exist for the global best?
It can be the true global best of the swarm, or it can be a local best based on the particles neighbourhood
How does binary PSO work?
The velocity is calculated, and x is set based on whether a random number is less than the sigmoid of the velocity