Introduction Flashcards

1
Q

States, actions, and rewards - formal definition

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

General transition function

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Total Return

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The objective function is the expected return

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

An action sampled from a policy is written as

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Trajectory notation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Return of a trajectory

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Expected return over all completed trajectory

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The policy gradient solves for

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Maximization with gradient

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The term πθ(at | st) is the probability of the action taken by the agent at time step t. The action is sampled from the policy, at ∼ πθ(st). The right-hand side of the equation states that the gradient of the log probability of the action with respect to θ is multiplied by return Rt(τ ).

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Probability of the entire trajectory

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly