Reinforcement Learning Flashcards

1
Q

What is a trajectory?

A

A trajectory τ is a sequence of states and actions in an environment.

τ = (s0, a0, s1, a2, …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are other names for a trajectory?

A

An episode or a rollout.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a reward function?

A

The reward function of an envrionment measures how good state-action pairs are:

rs = R(st,at)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the return of a trajectory?

A

The return is the measure of cumulative reward along it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the finite horizon undiscounted sum method of calculating return?

A

Finite horizon undiscounted sum of rewards:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the infinite horizon discounted sum method of calculating return?

A

Infinite horizon discounted sum of rewards:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a policy?

A

A policy π is a rule for selecting actions. It can be stochastic or deterministic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a stochastic policy?

A

A stochastic policy gives a probability distribution over actions, and actions are selected randomly based on that distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a deterministic policy?

A

A deterministic policy maps π directly to an action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the goal of reinforcement learning?

A

To learn a policy which maximizes expected return.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the optimal policy π*?

A

The optimal policy π* is:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two main approaches for solving the optimal policy problem?

A
  1. Policy Optimization
  2. Q-Learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly