RL2 Flashcards
1
Q
Backward view TD(lamba) - pseudo
A
2
Q
Sarsa(lamba) - pseudo
A
3
Q
Gradient MC for estimating v_hat
A
4
Q
Semi-gradient TD(0) for estimating v_hat
A
5
Q
Semi-gradient n-step for estimating v_hat
A
6
Q
Episodic semi-gradient Sarsa for stimating q_hat
A
7
Q
MC policy gradient method for estimating pi_theta
A
8
Q
QAC
A
9
Q
QAC with advantage function
A