M04 - Adaptive & Learning in Autonomous Systems Flashcards

Question 1

Q

What learning methods do we have?

Answer

A

reinforcement learning methods
evolutionary methods

Question 2

Q

What are the traits of reinforcement learning methods?

Answer

A

a form of trial and error learning
operates on a single individual
varies the parameters of the robot on the basis of a reward on each step
adapts only the connection weights of the neural network
sample efficient
complex & more subjected to instability

Question 3

Q

What are the traits of evolutionary methods?

Answer

A

operated on a population of individuals
varies the parameters of the robot on the basis of a reward that performance of the on the entire evolution
maximize cumulative reward
slower than reinforcement learning

Question 4

Q

What is adaptivity?

Answer

A

To develop the behavioral and cognitive skills required to perform a desired function

Question 5

Q

What are the methods to achieve adaptivity?

Answer

A

evolutionary methods
learning from demonstration
reinforcement learning

Question 6

Q

Describe learning from demonstration.

Answer

A

paradigm to enable robots to autonomously perform new tasks
no need to analytically decompose and manually program desired behavior
appropriate robot controller can be derived from observations of a human’s own performance

Question 7

Q

What is the general approach of demonstration learning?

Answer

A

Demonstrate -> Train a model -> Evaluate
- a robot that can perform the desired task
- a task demonstrator that can effortlessly perform the task

Question 8

Q

Why is learning from demonstration useful?

Answer

A

formal descriptions some tasks are hard to define
daily life activities are apparently easy for humans but computationally expensive for robots
we do not have enough robotics to work on all activities

Question 9

Q

What are the types of learning from demonstration?

Answer

A

kinesthetic teaching
teleoperation
direct imitation of human behavior

Question 10

Q

What is kinesthetic teaching?

Answer

A

The actuators of the robot are set in a passive mode and the experimenter physically manipulates the robot so to force it to produce the desired behavior

Question 11

Q

What is teleoperation?

Answer

A

The actuators of the robot are controlled by the experiment through a joystick and/or a haptic device

Question 12

Q

What is direct imitation of human behavior?

Answer

A

The training set is generated from the observation of a human displaying the desired behavior

Question 13

Q

What are the categories of machine learning?

Answer

A

supervised learning
unsupervised learning
reinforcement learning

Question 14

Q

Describe, give an example and application of supervised learning.

Answer

A

labeled data
feedback
e.g. Regression, SVM, Neural Network, etc.
App. Object recognition

Question 15

Q

Describe, give an example and app of unsupervised learning.

Answer

A

no labeled data
no feedback
e.g. K-means, self-organizing maps
App. Clustering

Question 16

Q

Describe, give an example and application of reinforcement learning.

Answer

Study These Flashcards

A

reward-based learning
increase cumulative discounted reward
e.g. Q-learning, SARSA, TD- learning, DQNs
App. Robot navigation

Question 17

Q

What is the problem task and goal of reinforcement learning?

Answer

Study These Flashcards

A

Problem: easy to evaluate but hard so solve
Goal: learn the action that maximizes (cumulative discounted) reward
E.g. outdoor navigation

Question 18

Q

What tuple does the Markov decision process describe?

Answer

Study These Flashcards

A

S: set of possible states
A: set of possible actions
P: policy: state, action transition probability
R: reward function for a (state, action) pair
γ: discount factor to determine whether the current or future reward is valuable for the agent

Question 19

Q

What is the Markov (memoryless) property?

Answer

Study These Flashcards

A

The current state of the agent contains all necessary information about the world. The history of the state is not important to decide the next state.

(doesn’t need to remember previous states to make decision on future states)

Question 20

Q

What is the Q update function?

Answer

Study These Flashcards

A

The learning rate determines to what extent newly acquired information overrides old information.
the discount factor determines the importance of future rewards

Question 21

Q

Where do rewards come from?

Answer

Study These Flashcards

A

Reward is a signal from the environment

Question 22

Q

What are the challenges in reinforcement learning?

Answer

Study These Flashcards

A

Credit assignment problem
[which action leads more reward]
High number of the states
[The number of states may not fit in to the memory]
Trade-off between exploration and exploitation
[Should agent try new actions or exploit the “learned” one]
Non differentiability of the real world
[Noise in the environment, unexpected changes]
The agent modifies the environment while learning
[No batch mode for collecting data]

Question 23

Q

How can we answer the challenges in reinforcement learning?

Answer

Study These Flashcards

A

Utilize neural networks

M04 - Adaptive & Learning in Autonomous Systems Flashcards

(23 cards)