8 - Deep Reinforcement Learning Flashcards

Question 1

Q

Reinforcement Learning

Answer

A

Solve via trial and error to maximise rewards

Question 2

Q

Deep q learnign

Answer

A

Use a neural network (CNN) to predict the Q values and then select the biggest Q value

Question 3

Q

Deep Q Learning Network Architecture

Answer

A

3 conv layers and 2 fully connected

Question 4

Q

How to fix: Consecutive samples might be correlated (Deep Q Learning)

Answer

A

Store the agent’s experiences and randomly create mini-batches from the pool of stored samples.

(Gives a variety of data)

Question 5

Q

How to fix: Small updates to Q value may significantly change the policy

Answer

A

Update the network weights every 10000 steps rather than each individual step. (update target model every 10k steps)

Question 6

Q

Epsilon value

Answer

A

Starts at 1.

Refers to the probability of choosing to explore.

Question 7

Q

Exploration

Answer

A

Eg select a random action

Allows an agent to improve its current knowledge

Question 8

Q

Exploitation

Answer

A

Behave as the robot has learnt so far.

Choose the greedy action to get the most reward by exploiting the agent’s current action-value estimates. May be sub-optimal

Question 9

Q

Epsilon-Greedy Action Selection

Answer

A

With probability epsilon ε, select random action a,
otherwise select at = argmaxaQ(Φ(st),a;θ)

balance exploration/exploitation

Question 10

Q

Store transition

Answer

A

Store state to solve continuous frame problem.

Question 11

Q

Sample random minibatch of transitions

Answer

A

Use random function to select data from memory

8 - Deep Reinforcement Learning Flashcards

(11 cards)