Rewards Flashcards

Question 1

Q

Why would we want to change the reward function for an MDP?

Answer

A

To make the MDP easier (speed, space, solvability) to solve while learning something similar to what it would have learned anyways

Question 2

Q

How can we change the reward function without changing the optimal policy?

Answer

A

Multiplying by a (positive) scalar
Shifting by a scalar (adding)
Non-linear potential-based transformations

Question 3

Q

What is the new Q function equal to if we multiply the reward function by a positive constant c?

Answer

A

Q’(s,a) = c*Q(s,a)

Question 4

Q

What is the new Q function equal to if we add a constant c to the reward function?

Answer

A

Q(s,a) = Q(s,a) + c/(1-gamma)

Question 5

Q

What is potential based reward shaping? What is the purpose?

Answer

A

Adding rewards for entering states but subtracting them when the state is exited. It is intended to encourage specific behavior (e.g. moving towards a goal) and speed up learning without creating an infinite reward pump.

Question 6

Q

What is equivalent to doing Q learning with potentials?

Answer

A

Q learning initialized with the potential function