Reinforcment learning Flashcards

Question 1

Q

What does the agent do at each timestep t?

Answer

A

Take action a_t
Recieve observation o_t
Recieve reward r_t

Question 2

Q

What does the environment do at each timestep t?

Answer

A

Recieve action a_t
Emit observaton o_t
Emit reward r_t

Question 3

Q

What is the environment state?

Answer

A

The environment state is the internal representation of the environment and usually not vissible to the agent

Question 4

Q

What is the agent state?

Answer

A

The agent state is the agents internal representation, the information the agent uses to make decissions

Question 5

Q

What is a fully observable environment?

Answer

A

Agent state = Environment state = Information state. This is a MDP.

Question 6

Q

What is the markov assumption?

Answer

A

s_t and a_t are independent of former states and actions.

Question 7

Q

What is the goal of reinforcment learning?

Answer

A

Find a policy that maximises the sum of rewards.

Question 8

Q

What is the value function in reinforcment learning?

Answer

A

The predicted expected reward of all future actions.

Question 9

Q

What is the idea behind DQN (Deep Q learning)

Answer

A

Use a neural network to estimate the Q(s,a) function

Question 10

Q

What are some tricks for training the network in DQN?

Answer

A

1) Experience replay, store old action/state/reward tuples and sample from them for training
2) Periodic updates.
3) Clip rewards.

Question 11

Q

What are some challenges of DQN?

Answer

A

1) Non- idd. data
2) Rapid policy changes
3) Unknown reward range

Question 12

Q

What are two applications of reinforcment learning to imaging?

Answer

A

Anatomical landmark detection

Standard plane detection

Question 13

Q

What is a multi scale agent?

Answer

A

A agent that can choose to change the resolution