8 - Deep Reinforcement Learning Flashcards

1
Q

Reinforcement Learning

A

Solve via trial and error to maximise rewards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Deep q learnign

A

Use a neural network (CNN) to predict the Q values and then select the biggest Q value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Deep Q Learning Network Architecture

A

3 conv layers and 2 fully connected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to fix: Consecutive samples might be correlated (Deep Q Learning)

A

Store the agent’s experiences and randomly create mini-batches from the pool of stored samples.

(Gives a variety of data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to fix: Small updates to Q value may significantly change the policy

A

Update the network weights every 10000 steps rather than each individual step. (update target model every 10k steps)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Epsilon value

A

Starts at 1.

Refers to the probability of choosing to explore.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Exploration

A

Eg select a random action

Allows an agent to improve its current knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Exploitation

A

Behave as the robot has learnt so far.

Choose the greedy action to get the most reward by exploiting the agent’s current action-value estimates. May be sub-optimal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Epsilon-Greedy Action Selection

A

With probability epsilon ε, select random action a,
otherwise select at = argmaxaQ(Φ(st),a;θ)

balance exploration/exploitation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Store transition

A

Store state to solve continuous frame problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sample random minibatch of transitions

A

Use random function to select data from memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly