CS7642_Week4 Flashcards
1
Q
What three ways can we modify reward functions?
A
- Scale Multiplying by positive scalar constant ‘c’
- Shift Adding a positive (or negative) scalar constant c
- Potential-based (change-in-state-based bonuses using psi function that leave things unchanged)
2
Q
What is the danger of reward shaping?
A
Positive feedback loops. If
3
Q
Initializing Q-values randomly is okay? (True/False)
A
Dr. Isbell says False. (zero is a better choice in his opinion). This is the result of the fact that where we start (i.e. initialize) the Q-value does matter, because the agent is initially going to want to be in those states, which may slow down learning.