CS7642_Week4 Flashcards by Daniel Barker

What three ways can we modify reward functions?

Scale Multiplying by positive scalar constant ‘c’
Shift Adding a positive (or negative) scalar constant c
Potential-based (change-in-state-based bonuses using psi function that leave things unchanged)

How well did you know this?

Not at all

Perfectly

What is the danger of reward shaping?

Positive feedback loops. If

How well did you know this?

Not at all

Perfectly

Initializing Q-values randomly is okay? (True/False)

Dr. Isbell says False. (zero is a better choice in his opinion). This is the result of the fact that where we start (i.e. initialize) the Q-value does matter, because the agent is initially going to want to be in those states, which may slow down learning.

How well did you know this?

Not at all

Perfectly