Introduction Flashcards
1
Q
States, actions, and rewards - formal definition
A
2
Q
General transition function
A
3
Q
Total Return
A
4
Q
The objective function is the expected return
A
5
Q
An action sampled from a policy is written as
A
6
Q
Trajectory notation
A
7
Q
Return of a trajectory
A
8
Q
Expected return over all completed trajectory
A
9
Q
The policy gradient solves for
A
10
Q
Maximization with gradient
A
11
Q
The term πθ(at | st) is the probability of the action taken by the agent at time step t. The action is sampled from the policy, at ∼ πθ(st). The right-hand side of the equation states that the gradient of the log probability of the action with respect to θ is multiplied by return Rt(τ ).
A
12
Q
Probability of the entire trajectory
A