10 Reinforcement Learning* Flashcards

Question 1

Q

what is q learning

Answer

A

create table storing state-action pairs

update table

Question 2

Q

what is the pseudo code for q learning

Answer

A

initialise Q(s, a)
initialise Q(termical) to 0
for each episode:
initialise state S
for each step in episode:
do
A <- select action
take action A, then observe reward R and next state S
Q(S, A) <- Q(S, A, + a[R + ymax Q(S, a) - Q(S, A)]
update state
while S is not terminal
end
end

Question 3

Q

how is q learning different

Answer

A

off policy by choosing the action with the max q value for the state

Question 4

Q

how is sarsa different

Answer

A

on policy by choosing the action defined by the policy and updates its q value

Question 5

Q

what is the pseudo code for sarsa

Answer

A

initialise Q arbitrarily
repeat (for each episode):
initialise s
choose a from s using policy derived from Q(eg. epsilon greedy)
repeat (for each step of episode):
take action a, observe r, s’
choose a’ from s’ using policy derived from Q
update s<- s’; a <- a’
until s is terminal

Question 6

Q

what does RL solve task

Answer

A

use model based approach
value learning by analysing how good to reach a certain state or take specific action
derive a policy that maximize rewards (policy gradient)

Question 7

Q

value iteration

Answer

A

start with a random value function
algorithm simpler
guaranteed to converge
more expensive
require more iterations

Question 8

Q

policy iteration

Answer

A

start with a random policy
algorithm more complex
guaranteed to converge
cheaper to compute
require fewer iterations

Question 9

Q

epsilon greedy

Answer

A

exploration
choose action randomly
exploitation
choose action based on highest rewards

Question 10

Q

2 other models

Answer

A

actor critic
imitation learning

10 Reinforcement Learning* Flashcards

(10 cards)