Lecture 4 Flashcards

1
Q

What does GP stand for

A

Genetic Programming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the genotype of a GP represent?

A

A program or math equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is the fitness of a GP solution determined?

A

By executing the program/solution and scoring the output. Usually the output is compared to a reference using MSE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the core principle of Symbolic Regression?

A

Given a data set of d features and n samples, where each sample of features results in y, find a function that can recreate this behaviour.

AKA find the hidden function given some input and output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the main reason to pick GP over ML?

A

GP will create interpretable models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In tree-based GP, what are the leave nodes called?

A

Terminal Nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In tree-based GP, what are the internal nodes called

A

Atomic Funcions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In tree-based GP, what is the primitive set? and where is it used?

A

Union of both the function set and the terminal set.

It is used in tree initialisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In tree-based GP, what is the terminal set?

A

A set of all terminal (leave) nodes. In GP, usually these are real values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain how a GP tree is initialised using the grow method.

A
  • Start from root node
  • If max depth has not been reached: randomly sample from primitive set.
  • else: randomly sample from terminal set
  • Stop when:
    • Max depth reached
    • Leaf node sampled
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain how a GP tree is initialised using the full method.

A
  • Start from the root node.
  • if max depth has not been reached: choose from function set
  • else: randomly choose from terminal set
  • Initialisation will stop itself as all leaves will be at max depth
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain how a GP tree is initialised using Ramped Half and Half.

A

Population is segmented into bins where every bin has its own max depth. The max depth ramps up per bin, such that it covers all depths between max and min depth.

Furthermore, each bin is initialised with ** both grow and full** method. It randomly selects a method for each individual.

Duplicates within a bin are discarded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In tree-based GP, what is non injective?

A

Many different trees with the same output can be generated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does a GP tree use crossover of mutation to cause variation?

A

Both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does a GP mutate?

A

It generates a new subtree, with wich a branch is replaced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why does GP require linear scaling?

A

The hardest part is getting the shape of your Symbolic Regression model right. If the MSE is very large because the model is displaced or of the wrong magnitude, this would take the GP very long to fix. However it can easily be solved mathematically, and hence be crucial for computation time.

17
Q

What type of selection is often used in tree-base GP?

A

Tournament selction, with large tournament size.

18
Q

When does GP meet sufficiency?

A

When the primitive set is sufficient; which is when it allows to encode the optimal program.

19
Q

When does GP meet consistency?

A

When the types of the primitive set + variation operators are type consistent. (f.i. you cannot apply boolean operators on real values.)

20
Q

What strategies are there for consistency?

A
  1. Flexibilty in function types
  2. Put constraint on variation operators
  3. Repair invalid subtrees.
21
Q

When does GP meet evaluation safety?

A

If all elements of the primitive set are evaluation-safe: when the evaluation of an atomic function does not comprise the evolution.

F.i. a division is comprimised when the denum is zero.

22
Q

How can we ensure evaluation safety?

A

Implement protected operators, often by introducing epsilon.

23
Q

What is an ERC and where is it used.

A

Ephemeral Random Constant is used to denote real values at the initialisation of trees. They are randomly selected between the bounds.

24
Q

In GP, what is bloating?

A

When the tree grows (becomes more complex) without much increase in accuracy.

25
Q

What are the shortcommings of GP?

A
  1. Unknown if optimal has been reached
  2. Single output
  3. bloating
26
Q

In GP, how can we combat the shortcomming of accuracy suffering due to size limitation?

A
  1. Apply scaling
  2. adding more operators
  3. Apply pattern matching to simplify tree.
  4. Coefficient optimalisation: Where you replace subtrees with constants.
27
Q

What is a solution for GP bloating and when is it usefull?

A

Exhaustive Symbolic Regression is when you take a more bruteforce approach. It can be usefull when you dont care about time, f.i. space exploration.

28
Q

What is a solution for multiple output in GP trees, and how does it work?

A

Cartesian Genetic Programming. function nodes are in a grid and their output value is labeled, such that other internal nodes can connect to it. Function nodes can only connect with every prior node, even leaves.

29
Q

Expressions resulting from GP are always interpretable.

A

Nah-ah