7 - Reinforcement Learning Flashcards

Question 1

Q

Reinforcement Learning

Answer

A

Learn to take actions in an environemtn to maximise rewards

Question 2

Q

State

Answer

A

Information from the environment

Question 3

Q

Policy (pi symbol)

Answer

A

A map from state space (s) to action space (a)

Question 4

Q

Reward Function (R) meaning

Answer

A

Maps each state (or state-action pair) to a reward number

Question 5

Q

Value FUnction, (Q pi) meaning

Answer

A

Value of a state/state-action pair.

Total expected reward

Question 6

Q

Q learning method

Answer

A

For each episode:
- Select a random initial state

While (not goal):
- Select one action for the current state
- Bellman Equation
- Set the next state as the current state.

Question 7

Q

Equation involved in Q Learning

Answer

A

Q(s, a) = R(s,a) + Gamma * Max[Q(next state, all actions)]

s - state space
a - action space
R - reward function
Gamma is a value (?)

Question 8

Q

Reward Matrix

Answer

A

Links states with reward values.

Eg Going to room 5 might mean giving moves to room 5 a 100 reward etc

The matrix is State (rows) by Action (columns)

Question 9

Q

Q Matrix - rows are ? columns are ? start values are ?

Answer

A

Matrix of state (rows) and actions (columns)

All values start 0

(9 cards)