7- Reinforcement Learning - SARSA/Q Flashcards

Question 1

Q

SARSA

Answer

A

State Action Reward State Action

Estimates all action values q(s,a) for all s and a with the policy π

Question 2

Q

SARSA pesudocode

Answer

A

Step size a between 0,1
small E > 0
Intialise Q(s,a) for all except Q(terminal,) = 0

For each episode:
- Init S
- Choose A from S using policy from Q
- Loop for each step of episode:
- - Take action A, observe R, S’
- - Choose A’ from S’ (using policy…)
- - Q(S,A) <- Q(S,A) + a[R+γQ(S’,A’)-Q(S,A)]
- until S is terminal

Probably don’t have to remember all this

Question 3

Q

Q learning

Answer

A

Approximates q*(a) - that is q subscript star, not q times a

Q(St,At) <- Q(St,At) + α[Rt+1 + γmaxaQ(St+1,a) - Q(St,At)] independently of the policy (off policy)

Can be a lookup table with all states and actions

Question 4

Q

Q learning Pseudocode

Answer

A

params: Step size α in (0,1], small epsilon >0
Init Q(s,a) for all states and actions except Q(teminal,.) = 0

For each episode:
- Choose A from S using policy Q
- Take action A observe R, S’
- Q(S,A) <- Q(S,A) + α[R + γmaxaQ(S’,a)-Q(S,A)]
- S <- S’ (make it current state)
until S is terminal

Question 5

Q

SARSA vs Q Learning: The Cliff

Each step is R=-1 but a fall is R=-100

Answer

A

Q learning finds the shortest path near the cliff but occasionally falls.
SARSA takes into account falling (it’s online) so runs along a safer path
Q learning may be more optimistic

7- Reinforcement Learning - SARSA/Q Flashcards

(5 cards)