Reinforcment learning Flashcards
What does the agent do at each timestep t?
Take action a_t
Recieve observation o_t
Recieve reward r_t
What does the environment do at each timestep t?
Recieve action a_t
Emit observaton o_t
Emit reward r_t
What is the environment state?
The environment state is the internal representation of the environment and usually not vissible to the agent
What is the agent state?
The agent state is the agents internal representation, the information the agent uses to make decissions
What is a fully observable environment?
Agent state = Environment state = Information state. This is a MDP.
What is the markov assumption?
s_t and a_t are independent of former states and actions.
What is the goal of reinforcment learning?
Find a policy that maximises the sum of rewards.
What is the value function in reinforcment learning?
The predicted expected reward of all future actions.
What is the idea behind DQN (Deep Q learning)
Use a neural network to estimate the Q(s,a) function
What are some tricks for training the network in DQN?
1) Experience replay, store old action/state/reward tuples and sample from them for training
2) Periodic updates.
3) Clip rewards.
What are some challenges of DQN?
1) Non- idd. data
2) Rapid policy changes
3) Unknown reward range
What are two applications of reinforcment learning to imaging?
Anatomical landmark detection
Standard plane detection
What is a multi scale agent?
A agent that can choose to change the resolution