M04 - Adaptive & Learning in Autonomous Systems Flashcards

1
Q

What learning methods do we have?

A
  • reinforcement learning methods
  • evolutionary methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the traits of reinforcement learning methods?

A
  • a form of trial and error learning
  • operates on a single individual
  • varies the parameters of the robot on the basis of a reward on each step
  • adapts only the connection weights of the neural network
  • sample efficient
  • complex & more subjected to instability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the traits of evolutionary methods?

A
  • operated on a population of individuals
  • varies the parameters of the robot on the basis of a reward that performance of the on the entire evolution
  • maximize cumulative reward
  • slower than reinforcement learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is adaptivity?

A

To develop the behavioral and cognitive skills required to perform a desired function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the methods to achieve adaptivity?

A
  • evolutionary methods
  • learning from demonstration
  • reinforcement learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe learning from demonstration.

A
  • paradigm to enable robots to autonomously perform new tasks
  • no need to analytically decompose and manually program desired behavior
    appropriate robot controller can be derived from observations of a human’s own performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the general approach of demonstration learning?

A

Demonstrate -> Train a model -> Evaluate
- a robot that can perform the desired task
- a task demonstrator that can effortlessly perform the task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is learning from demonstration useful?

A
  • formal descriptions some tasks are hard to define
  • daily life activities are apparently easy for humans but computationally expensive for robots
  • we do not have enough robotics to work on all activities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the types of learning from demonstration?

A
  • kinesthetic teaching
  • teleoperation
  • direct imitation of human behavior
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is kinesthetic teaching?

A

The actuators of the robot are set in a passive mode and the experimenter physically manipulates the robot so to force it to produce the desired behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is teleoperation?

A

The actuators of the robot are controlled by the experiment through a joystick and/or a haptic device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is direct imitation of human behavior?

A

The training set is generated from the observation of a human displaying the desired behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the categories of machine learning?

A
  • supervised learning
  • unsupervised learning
  • reinforcement learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe, give an example and application of supervised learning.

A
  • labeled data
  • feedback
    e.g. Regression, SVM, Neural Network, etc.
    App. Object recognition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe, give an example and app of unsupervised learning.

A
  • no labeled data
  • no feedback
    e.g. K-means, self-organizing maps
    App. Clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe, give an example and application of reinforcement learning.

A
  • reward-based learning
  • increase cumulative discounted reward
    e.g. Q-learning, SARSA, TD- learning, DQNs
    App. Robot navigation
17
Q

What is the problem task and goal of reinforcement learning?

A

Problem: easy to evaluate but hard so solve
Goal: learn the action that maximizes (cumulative discounted) reward
E.g. outdoor navigation

18
Q

What tuple does the Markov decision process describe?

A
  • S: set of possible states
  • A: set of possible actions
  • P: policy: state, action transition probability
  • R: reward function for a (state, action) pair
  • γ: discount factor to determine whether the current or future reward is valuable for the agent
19
Q

What is the Markov (memoryless) property?

A

The current state of the agent contains all necessary information about the world. The history of the state is not important to decide the next state.

(doesn’t need to remember previous states to make decision on future states)

20
Q

What is the Q update function?

A
  • The learning rate determines to what extent newly acquired information overrides old information.
  • the discount factor determines the importance of future rewards
21
Q

Where do rewards come from?

A

Reward is a signal from the environment

22
Q

What are the challenges in reinforcement learning?

A
  • Credit assignment problem
    [which action leads more reward]
  • High number of the states
    [The number of states may not fit in to the memory]
  • Trade-off between exploration and exploitation
    [Should agent try new actions or exploit the “learned” one]
  • Non differentiability of the real world
    [Noise in the environment, unexpected changes]
  • The agent modifies the environment while learning
    [No batch mode for collecting data]
23
Q

How can we answer the challenges in reinforcement learning?

A

Utilize neural networks