Class 4 - Adaptation Flashcards
3 types of adaptive processes
- Evolutionary algorithms
- Reinforcement learning
- Learning by demonstration
Evolutionary robots…
- are created to adapt to their environment via evolutionary computing
- can generate off-springs
- mainly focus on evolving the brain
- can work on adjusting issues by themselves
Genotypes
numbers that describe the phenotype of the evolutionary robot
Genotypes can be…:
A. discrete numbers between [0, 1]
B. continuous numbers between [0, 1]
C. both A and B
D. neither A nor B
C. both A and B
T/F: Different parts of the genotype can describe different parts of the phenotype of the robot
True
An evolutionary algorithm can be split into two main phases…
- the “testing phase” –> where the robot is put in the environment and the fitness function is evaluated
- the “generation phase”, where the resulting fitness function is used to create the next generation of offsprings
3 methods to create off-springs (in the context of evolutionary algorithms)…
- genetic algorithm
- evolutionary strategy
- modern evolutionary strategy
Match the off-spring method (in the context of evolutionary algorithms) to its description…
A. genetic algorithm
B. evolutionary strategy
C. modern evolutionary strategy
- only allows continuous values in the genotype which are then crossed-over with mutation (which allows a number between 0 and 1) to create the genotype of the off-spring. The first generation of genotypes can be random and parents can outlive the children if they have a better fitness function.
- only allows discrete values in the genotype which are then crossed-over with (flipped value) mutation to create the genotype of the off-spring. Also, the first generation of genotypes can be random.
- Uses correlation to see whether the previous children were performing better than the new version. Only uses one parent since it is computationally challenging.
A-2
B-1
C-3
Advantages of evolutionary algorithms
If one component is weaker, the others can compensate!
Disadvantages of evolutionary algorithms
- unbounded complexity does not happen
- adaptations are short-term → need to fix problems immediately
- prevents exploring new ideas because offspring comes from a certain family
T/F: in RL, only the off-spring “carries” the improvement
False, that’s the case generally for evolutionary algorithms. In RL, the robot improves itself continuously given a reward
Policy
explains to the robot which action to take in a certain state
Is the policy “ideal” before the robot starts exploring?
No, the robot updates the policy as it explores the environment
Two types of policies
- deterministic
2. stochastic
Gaussian policy is an example of…
A. deterministic policy
B. stochastic policy
B. stochastic policy
In a deterministic policy…
the robot has n categories that it can use / pick from
In a stochastic policy…
there is a condition on execution: when the robot is in one given field, there is i.e., 6 % chance that the robot will move to the right, 2 % it will move to the left, 1% it will move up and 1% it will move down.
T/F: A gaussian policy is used when the action space is continuous
True
Finite-horizon undiscounted return
summing all the rewards and picking the policy which corresponds to the highest sum
Infinite-horizon discounted reward
summing all rewards and subtracting all steps taken. if the result is negative, it means that we took more steps than the total reward, which is bad; if the result is positive, it means that we got a high total result with little number of steps, which is what we want
Q-learning
- the robot explores environment itself
- Q-learning is an off policy reinforcement learning algorithm that seeks to find the best action to take given the current state. It’s considered off-policy because the q-learning function learns from actions that are outside the current policy, like taking random actions, and therefore a policy isn’t needed.
Monte-carlo approach
class of computational algorithms that rely on repeated random sampling to obtain numerical results.
T/F: the “learning by demonstration” adaptation method has 3 main phases.
If TRUE, name them
If FALSE, give the exact number
True;
demonstration phase; training phase; testing phase
In the context of learning by demonstration > > in the demonstration phase…
…Match the following types of demonstrations with their description
A. kinesthetic teaching
B. tele-operated teaching
C. direct imitation of human behavior
- imitate human’s behavior from observational human data
- teacher uses controller to show the robot the correct behavior
- teacher places the robot’s body in the correct position
C-1
A-3
B-2
In the context of learning by demonstration, in which phase (demonstration / training / testing) does the robot actually learn from the observations?
Training phase