How to apply DL to robotics options: 1. "easy" fix (what is it about)? 2. "harder" fix (what is it about)?

1. easy fix: replace some components with neural networks, BUT we still have to engineer the entire system, and design (and train) the different components separately (issues that may arise: mistakes in pipeline, not great general movements) 2. hard fix: end-to-end learning, automatic learning technique where the model learns all the steps between the initial input phase and the final output result. It takes the input and returns a distribution over action.

Class 6 - Guest lecture - Deep Reinforcement learning Flashcards by Ila Ga

Difference between reinforcement learning deep learning

The difference between them is that deep learning is learning from a training set and then applying that learning to a new data set, while reinforcement learning is dynamically learning by adjusting actions based in continuous feedback to maximize a reward.

How well did you know this?

Not at all

Perfectly

Traditional methods for robotics

work well for demos and narrow applications, but they don’t generalize well and require expensive and tedious adaptation to any new task or environment. (i.e., Boston robotics)

How well did you know this?

Not at all

Perfectly

Deep Learning for Robotics

has proven effective to achieve (super)humans-level performance on many tasks:

object detection and recognition of faces
speech recognition
dexterous manipulation
still in the very early stages

How well did you know this?

Not at all

Perfectly

How to apply DL to robotics options:

“easy” fix (what is it about)?
“harder” fix (what is it about)?

easy fix: replace some components with neural networks, BUT we still have to engineer the entire system, and design (and train) the different components separately (issues that may arise: mistakes in pipeline, not great general movements)
hard fix: end-to-end learning, automatic learning technique where the model learns all the steps between the initial input phase and the final output result. It takes the input and returns a distribution over action.

How well did you know this?

Not at all

Perfectly

The reinforcement approach to learning a solution…(pick one):

A. uses simulations to train the agent
B. places the agent in an environment and lets it explore this environment by performing actions which will cause a new state and reward for the agent.

How well did you know this?

Not at all

Perfectly

A solution to the fact that reinforcement learning generally requires a lot of time and a lot of repetitions is…

to use simulations to train thousands of agents in parallel

How well did you know this?

Not at all

Perfectly

Although deep learning has proven to be more robust against perturbations during training / testing, it still reports one main issue, that is…

even the best simulations are too different from reality

How well did you know this?

Not at all

Perfectly

reality gap (in the context of deep learning)

you might lose a lot when moving from simulation to reality

How well did you know this?

Not at all

Perfectly

Fill in:

In the context of deep learning simulations, a small error compound at each time step might result in very - similar / different - trajectories between simulation and real world.

different

How well did you know this?

Not at all

Perfectly

One approach that tries to solve the reality gap issue is the…

Sim2Real approach

How well did you know this?

Not at all

Perfectly

The Sim2Real approach uses dynamics randomization to…

train robots in simulation using a wide range of physics (e.g., amount of gravity, size of each component of the robot, frictions, visual appearance and lights, etc…) to force the robot to work over many different environments, with the hope that the real world ends up included.

dis: very computationally expensive

How well did you know this?

Not at all

Perfectly

One big problem in reinforcement learning is that (multiple picks are possible):

A. we have to design the reward function by hand
B. the reward function is always the same
C. the reward function we choose may not result in the behavior we want

A, C

How well did you know this?

Not at all

Perfectly

The idea behind imitation learning is to…

collect demonstrations from humans solving the target task (in the demonstration phase), and use them to train an agent (training phase + test phase).

How well did you know this?

Not at all

Perfectly

3 main approaches to imitation learning in the context of deep learning

behavior cloning
inverse reinforcement learning
sequence modelling

How well did you know this?

Not at all

Perfectly

In the context of imitation learning in deep learning, behavior cloning…

treats the problem as supervised learning.
Collects (state, action) pairs from many demonstration episodes.
Trains a neural network to produce the same actions on the same states.

How well did you know this?

Not at all

Perfectly

Main issues with behavior cloning

Study These Flashcards

Since the agent tends to overfit the trajectories of the demonstrator, if the agent’s trajectory deviates from the
demonstrations, it can quickly diverge by compounding of
errors.

Solution to the behavior cloning problem

Study These Flashcards

Do not train an agent to memorize a trajectory path, but more to learn to assign a certain probability to a trajectory. This probability will tell how likely this trajectory was taken by a human. If the probability is high, then the action was likely taken by a human, hence the robot will follow that action.

Incompliance mode

Study These Flashcards

robot producing motor commands to its motor but also allows humans to move them.

In the inverse reinforcement learning approach…

Study These Flashcards

the agent infers the reward function that generated the behavior of another agent (from human demonstrations) and attempts to reproduce the same behavior using the inferred reward function

Main issue with the inverse reinforcement learning approach

Study These Flashcards

There are many reward functions that can explain the

same observed behavior, so how can we differentiate them?

In the sequence modeling approach…

Study These Flashcards

the goal is to predict a full sequence of actions that leads to a sequence of high rewards. Specifically, we want to learn the probability distribution over the most successful sequence of actions.

Task specification

Study These Flashcards

The reward function determines what behavior is learnt.

2. Different reward functions can make learning easier or harder on the same task.

Reward hacking

Study These Flashcards

occurs when an agent learns to exploit a poorly specified reward function to obtain high rewards by ‘cheating’.

Example of reward hacking

Study These Flashcards

a coffee-making robot is incentivized to learn all the steps to make a cup of coffee. One of the steps that is rewarded is ‘turn on the coffee machine’. A naive implementation of the reward function may lead to the robot repeatedly turning the machine on and off to keep collecting the same reward multiple times.

A cooperation game shows NO nash equilibrium when...

one of the teams can improve its strategy without waiting for the other team to change its strategy

A competition game shows NO nash equilibrium when...

one of the teams can improve its strategy without waiting for the other team to change its strategy

Class 6 - Guest lecture - Deep Reinforcement learning Flashcards

(26 cards)