M04 - Adaptive & Learning in Autonomous Systems Flashcards
What learning methods do we have?
- reinforcement learning methods
- evolutionary methods
What are the traits of reinforcement learning methods?
- a form of trial and error learning
- operates on a single individual
- varies the parameters of the robot on the basis of a reward on each step
- adapts only the connection weights of the neural network
- sample efficient
- complex & more subjected to instability
What are the traits of evolutionary methods?
- operated on a population of individuals
- varies the parameters of the robot on the basis of a reward that performance of the on the entire evolution
- maximize cumulative reward
- slower than reinforcement learning
What is adaptivity?
To develop the behavioral and cognitive skills required to perform a desired function
What are the methods to achieve adaptivity?
- evolutionary methods
- learning from demonstration
- reinforcement learning
Describe learning from demonstration.
- paradigm to enable robots to autonomously perform new tasks
- no need to analytically decompose and manually program desired behavior
appropriate robot controller can be derived from observations of a human’s own performance
What is the general approach of demonstration learning?
Demonstrate -> Train a model -> Evaluate
- a robot that can perform the desired task
- a task demonstrator that can effortlessly perform the task
Why is learning from demonstration useful?
- formal descriptions some tasks are hard to define
- daily life activities are apparently easy for humans but computationally expensive for robots
- we do not have enough robotics to work on all activities
What are the types of learning from demonstration?
- kinesthetic teaching
- teleoperation
- direct imitation of human behavior
What is kinesthetic teaching?
The actuators of the robot are set in a passive mode and the experimenter physically manipulates the robot so to force it to produce the desired behavior
What is teleoperation?
The actuators of the robot are controlled by the experiment through a joystick and/or a haptic device
What is direct imitation of human behavior?
The training set is generated from the observation of a human displaying the desired behavior
What are the categories of machine learning?
- supervised learning
- unsupervised learning
- reinforcement learning
Describe, give an example and application of supervised learning.
- labeled data
- feedback
e.g. Regression, SVM, Neural Network, etc.
App. Object recognition
Describe, give an example and app of unsupervised learning.
- no labeled data
- no feedback
e.g. K-means, self-organizing maps
App. Clustering
Describe, give an example and application of reinforcement learning.
- reward-based learning
- increase cumulative discounted reward
e.g. Q-learning, SARSA, TD- learning, DQNs
App. Robot navigation
What is the problem task and goal of reinforcement learning?
Problem: easy to evaluate but hard so solve
Goal: learn the action that maximizes (cumulative discounted) reward
E.g. outdoor navigation
What tuple does the Markov decision process describe?
- S: set of possible states
- A: set of possible actions
- P: policy: state, action transition probability
- R: reward function for a (state, action) pair
- γ: discount factor to determine whether the current or future reward is valuable for the agent
What is the Markov (memoryless) property?
The current state of the agent contains all necessary information about the world. The history of the state is not important to decide the next state.
(doesn’t need to remember previous states to make decision on future states)
What is the Q update function?
- The learning rate determines to what extent newly acquired information overrides old information.
- the discount factor determines the importance of future rewards
Where do rewards come from?
Reward is a signal from the environment
What are the challenges in reinforcement learning?
- Credit assignment problem
[which action leads more reward] - High number of the states
[The number of states may not fit in to the memory] - Trade-off between exploration and exploitation
[Should agent try new actions or exploit the “learned” one] - Non differentiability of the real world
[Noise in the environment, unexpected changes] - The agent modifies the environment while learning
[No batch mode for collecting data]
How can we answer the challenges in reinforcement learning?
Utilize neural networks