4 Elements of Reinforcement Learning (inwards to outwards) 1. Policy 2. Reward 3. Value 4. Model

1. Policy: What to do? 2. Reward: What is good? 3. Value: What is good because it predicts reward? 4. What follows what?

Lecture 2 - Machine Learning Flashcards by R F

Learning

Acquiring new knowledge or skills and improve one’s performance

How well did you know this?

Not at all

Perfectly

A robot can learn about .. (2)

Itself: sensors or actuator info that might vary over time
Its environment: Learning maps, how to achieve goal

How well did you know this?

Not at all

Perfectly

(3) benefits of learning

Enabling to perform its task better
Adapting to changes in environment (hard to preprogram)
Simplify designer’s programming work

How well did you know this?

Not at all

Perfectly

3 Forms of Learning

1. Supervised learning

With external supervisor/teacher

In- and output pairs are presented & the function between these pairs is learned

How well did you know this?

Not at all

Perfectly

3 Forms of Learning

2. Unsupervised learning

All information should be taken from the inputs alone

-> it can be useful to preprocess inputs, e.g. divide in meaningful clusters

How well did you know this?

Not at all

Perfectly

3 Forms of Learning

3. Reinforcement learning

With an evaluation signal.

How well did you know this?

Not at all

Perfectly

What is feedback in supervised learning?

The target action or output for a specific input.

How well did you know this?

Not at all

Perfectly

Example of supervised learning + name

Neural network learning

-> the weights of the connections between nods are learned (connectionist learning)

How well did you know this?

Not at all

Perfectly

How did ALVINN learn to drive?

ALVINN steers how it think it should
Humans show how it should steer
ALVINN computes the error
ALVINN uses this error to update weights
-> REPEAT

How well did you know this?

Not at all

Perfectly

Disadvantage of supervised learning (3)

A trainer is needed (less autonomous)
Not online (first training then operating phase)
Not incremental (when it is operating, it is not learning)

How well did you know this?

Not at all

Perfectly

4 Characteristics of Reinforcement Learning

Learning from interaction
Goal-oriented learning
Learning about, from, and while interacting with an external environment.
Learning what to do (how to map situations to actions) so as to maximize a numerical reward signal

How well did you know this?

Not at all

Perfectly

5 Key features of Reinforcement Learning

Learner is not told which actions to take
Trial and Error search
Possibility of delayed reward
Sacrifice short-term gains for greater long-term gains
The need to explore and exploit
Considers the whole problem of a goal-directed agent interacting with an uncertain environment

How well did you know this?

Not at all

Perfectly

4 Characteristics of a complete agent

It is temporally situated
It learns and plans continually
The object is to affect the environment
The environment is stochastic / uncertain

How well did you know this?

Not at all

Perfectly

4 Elements of Reinforcement Learning (inwards to outwards)

Policy
Reward
Value
Model

Policy: What to do?
Reward: What is good?
Value: What is good because it predicts reward?
What follows what?

How well did you know this?

Not at all

Perfectly

Actuator space

Set of all possible actions

How well did you know this?

Not at all

Perfectly

When the robot knows/learned which action to perform in each state it has learned a …

Study These Flashcards

Reactive controller

Exploration (RL)

Study These Flashcards

In order to learn the optimal action, the robot has to try everything (trial and error)

Exploitation (RL)

Study These Flashcards

Simultaneously to exploration, the robot should perform well and exploit what it has learned.

Once mapping between inputs and actions is learned, the robot can just exploit the learned knowledge and stop exploring.. right?

Study These Flashcards

No, not always. There might be (1) sensor errors (uncertainty) and (2) a changing environment.

Exploitation/Exploration dilemma (Trade off between..(2))

Study These Flashcards

Constantly learning - Exploration (possibly doing things less perfectly)
Constantly using what it knows (Exploitation) (Can not improve predictions of other actions)

What is learned in RL? Consider robot’s actuator and sensor space!

Study These Flashcards

The robot learns a value-function (possibly in table) with all possible state-action pairs along with their Q-values.

Q-value

Study These Flashcards

Grows if good things happen and shrinks if bad things happen.

When is the RL table learning method efficient (2)?

Study These Flashcards

When the state space is not too big

2. States and actions are discrete

What if table learning method alone is inefficient?

Study These Flashcards

Combine RL with function approximators such as neural networks

Temporal Credit Assignment (RL)

In a maze, result of tested state-action pair may come long after the action -> Rewards and punishment have to be backpropagated and given multiple previous state-action pairs.

Notable RL Applications (2)

1. TD-Gammon: World's best backgammon program | 2. Elevator control

Learning by Imitation What does it free from? What does it involve?

It frees from trial and error but is not trivial! | It involves careful decisions about internal representations

Learning from Demonstration

Learning by experiencing a task directly (human controls the robot to let it experience the result of good actions)

What does the robot need to learn in imitation/demonstration learning (2)?

1. What it experienced during trying | 2. How it can generate that behavior again

Why is forgetting important (2)?

1. Making room for new information | 2. Replacing old information that is no longer correct

What determines which type of learning methods are possible for a particular learning problem?

Amount and type of information (feedback, reward, punishment, error)

Can a robot use multiple learning methods at the same time?

YES!

Lecture 2 - Machine Learning Flashcards

(32 cards)