Reinforcement Learning Flashcards
What is a one line summary of Reinforcement Learning?
Reinforcement Learning trains an agent to take actions in an environment by sending the reward and the state.
What are some of the main aspects of Reinforcement Learning?
It employs Trial and Error to find a solution
It makes a sequence of decisions
Does not require labelled input/output pairs, but rules of reward and penalty
Aim to take actions by maximising reward.
What are some of the main applications for Reinforcement Learning?
Game Playing
Robotics
Logistics
Autonomous Driving
What are the key elements of Reinforcement Learning that describe the overall process?
The Environment - Physical world in which the agent operates
Agent - Learns to act in a way that maximises the cumulative reward
State - Current situation of the Agent
Reward - Gets feedback from the environment
Policy - The method to map the Agent’s state to actions
Value - Future reward that an Agent would receive by taking an action in a particular state
What are the elements of the Markov Decision Process?
Construct a set of environment states (S)
Define a set of possible actions (A)
Define a real valued reward function (R)
Build a transition model (P(s’, s|a)
The hyperparameter (r) is used as a Discount Factor
What is the primary goal of the Markov Process?
Find a good policy for the Agent to act on at their current state, which maximises the cumulative reward
What is the step-by-step process of the Markov Decision Process?
At the beginning, environment samples initial state of the agent
Until the program is terminated/finished:
- Agent selects an action
- Environment samples the reward
- Environment samples the next state
- Agent receives the reward and next state
What is a Policy in the context of the Markov Decision Process?
A policy is a function from S to A that specifies what action to take in each state.
What is Q-Learning designed for?
Q-Learning is a method designed to find the next best action given a current state, which aims to maximise the cumulative reward
What is the Bellman Equation?
New state and action = The old state and action + learning rate(reward + discount rate(maximum expected future reward) - old state and action value)
What does a Q-value define?
A Q-value is a representation of the quality of a State/Action pair
What is the learning process for Q-Learning?
Initialise all Q-values in a Q-table to 0
Choose an action for the current state, with the best Q-value
Perform action, which results in a new state
Measure the reward for undergoing that action from that state
Update that respective Q value using the Bellman Equation
What is the drawback of Q-Learning?
It is computationally expensive in both the training and inference stages
Why do we use the Epsilon-Greedy Exploration Strategy in Q-Learning?
The aim is to introduce some randomness on selecting actions, so that it encourages the Agent to explore other courses of action, rather than constantly picking the best option in the short term.
What is different between Q-Learning and Deep Q-Learning?
Deep Q-Learning represents the Q-table as a Neural Network, which maps to specific actions. Q-Learning represents all actions in a table, called a Q-table.