Class 6 - Guest lecture - Deep Reinforcement learning Flashcards

1
Q

Difference between reinforcement learning deep learning

A

The difference between them is that deep learning is learning from a training set and then applying that learning to a new data set, while reinforcement learning is dynamically learning by adjusting actions based in continuous feedback to maximize a reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Traditional methods for robotics

A

work well for demos and narrow applications, but they don’t generalize well and require expensive and tedious adaptation to any new task or environment. (i.e., Boston robotics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Deep Learning for Robotics

A

has proven effective to achieve (super)humans-level performance on many tasks:

  1. object detection and recognition of faces
  2. speech recognition
  3. dexterous manipulation
  4. still in the very early stages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to apply DL to robotics options:

  1. “easy” fix (what is it about)?
  2. “harder” fix (what is it about)?
A
  1. easy fix: replace some components with neural networks, BUT we still have to engineer the entire system, and design (and train) the different components separately (issues that may arise: mistakes in pipeline, not great general movements)
  2. hard fix: end-to-end learning, automatic learning technique where the model learns all the steps between the initial input phase and the final output result. It takes the input and returns a distribution over action.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The reinforcement approach to learning a solution…(pick one):

A. uses simulations to train the agent
B. places the agent in an environment and lets it explore this environment by performing actions which will cause a new state and reward for the agent.

A

B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A solution to the fact that reinforcement learning generally requires a lot of time and a lot of repetitions is…

A

to use simulations to train thousands of agents in parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Although deep learning has proven to be more robust against perturbations during training / testing, it still reports one main issue, that is…

A

even the best simulations are too different from reality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

reality gap (in the context of deep learning)

A

you might lose a lot when moving from simulation to reality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Fill in:

In the context of deep learning simulations, a small error compound at each time step might result in very - similar / different - trajectories between simulation and real world.

A

different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

One approach that tries to solve the reality gap issue is the…

A

Sim2Real approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The Sim2Real approach uses dynamics randomization to…

A

train robots in simulation using a wide range of physics (e.g., amount of gravity, size of each component of the robot, frictions, visual appearance and lights, etc…) to force the robot to work over many different environments, with the hope that the real world ends up included.

dis: very computationally expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

One big problem in reinforcement learning is that (multiple picks are possible):

A. we have to design the reward function by hand
B. the reward function is always the same
C. the reward function we choose may not result in the behavior we want

A

A, C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The idea behind imitation learning is to…

A

collect demonstrations from humans solving the target task (in the demonstration phase), and use them to train an agent (training phase + test phase).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

3 main approaches to imitation learning in the context of deep learning

A
  1. behavior cloning
  2. inverse reinforcement learning
  3. sequence modelling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the context of imitation learning in deep learning, behavior cloning…

A
  1. treats the problem as supervised learning.
  2. Collects (state, action) pairs from many demonstration episodes.
  3. Trains a neural network to produce the same actions on the same states.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Main issues with behavior cloning

A

Since the agent tends to overfit the trajectories of the demonstrator, if the agent’s trajectory deviates from the
demonstrations, it can quickly diverge by compounding of
errors.

17
Q

Solution to the behavior cloning problem

A

Do not train an agent to memorize a trajectory path, but more to learn to assign a certain probability to a trajectory. This probability will tell how likely this trajectory was taken by a human. If the probability is high, then the action was likely taken by a human, hence the robot will follow that action.

18
Q

Incompliance mode

A

robot producing motor commands to its motor but also allows humans to move them.

19
Q

In the inverse reinforcement learning approach…

A

the agent infers the reward function that generated the behavior of another agent (from human demonstrations) and attempts to reproduce the same behavior using the inferred reward function

20
Q

Main issue with the inverse reinforcement learning approach

A

There are many reward functions that can explain the

same observed behavior, so how can we differentiate them?

21
Q

In the sequence modeling approach…

A

the goal is to predict a full sequence of actions that leads to a sequence of high rewards. Specifically, we want to learn the probability distribution over the most successful sequence of actions.

22
Q

Task specification

A
  1. The reward function determines what behavior is learnt.

2. Different reward functions can make learning easier or harder on the same task.

23
Q

Reward hacking

A

occurs when an agent learns to exploit a poorly specified reward function to obtain high rewards by ‘cheating’.

24
Q

Example of reward hacking

A

a coffee-making robot is incentivized to learn all the steps to make a cup of coffee. One of the steps that is rewarded is ‘turn on the coffee machine’. A naive implementation of the reward function may lead to the robot repeatedly turning the machine on and off to keep collecting the same reward multiple times.

25
Q

A cooperation game shows NO nash equilibrium when…

A

one of the teams can improve its strategy without waiting for the other team to change its strategy

26
Q

A competition game shows NO nash equilibrium when…

A

one of the teams can improve its strategy without waiting for the other team to change its strategy