7 - Reinforcement Learning Flashcards
Reinforcement Learning
Learn to take actions in an environemtn to maximise rewards
State
Information from the environment
Policy (pi symbol)
A map from state space (s) to action space (a)
Reward Function (R) meaning
Maps each state (or state-action pair) to a reward number
Value FUnction, (Q pi) meaning
Value of a state/state-action pair.
Total expected reward
Q learning method
For each episode:
- Select a random initial state
While (not goal):
- Select one action for the current state
- Bellman Equation
- Set the next state as the current state.
Equation involved in Q Learning
Q(s, a) = R(s,a) + Gamma * Max[Q(next state, all actions)]
s - state space
a - action space
R - reward function
Gamma is a value (?)
Reward Matrix
Links states with reward values.
Eg Going to room 5 might mean giving moves to room 5 a 100 reward etc
The matrix is State (rows) by Action (columns)
Q Matrix - rows are ? columns are ? start values are ?
Matrix of state (rows) and actions (columns)
All values start 0