Reinforcement Learning Flashcards

Question 1

Q

What is a trajectory?

Answer

A

A trajectory τ is a sequence of states and actions in an environment.

τ = (s₀, a₀, s₁, a₂, …)

Question 2

Q

What are other names for a trajectory?

Answer

A

An episode or a rollout.

Question 3

Q

What is a reward function?

Answer

A

The reward function of an envrionment measures how good state-action pairs are:

r_s = R(s_t,a_t)

Question 4

Q

What is the return of a trajectory?

Answer

A

The return is the measure of cumulative reward along it.

Question 5

Q

What is the finite horizon undiscounted sum method of calculating return?

Answer

A

Finite horizon undiscounted sum of rewards:

Question 6

Q

What is the infinite horizon discounted sum method of calculating return?

Answer

A

Infinite horizon discounted sum of rewards:

Question 7

Q

What is a policy?

Answer

A

A policy π is a rule for selecting actions. It can be stochastic or deterministic.

Question 8

Q

What is a stochastic policy?

Answer

A

A stochastic policy gives a probability distribution over actions, and actions are selected randomly based on that distribution.

Question 9

Q

What is a deterministic policy?

Answer

A

A deterministic policy maps π directly to an action.

Question 10

Q

What is the goal of reinforcement learning?

Answer

A

To learn a policy which maximizes expected return.

Question 11

Q

What is the optimal policy π*?

Answer

A

The optimal policy π* is:

Question 12

Q

What are the two main approaches for solving the optimal policy problem?

Answer

A

Policy Optimization
Q-Learning

Question 13

Q

Reinforcement Learning Flashcards

(13 cards)