CS7642_Week4 Flashcards

1
Q

What three ways can we modify reward functions?

A
  1. Scale Multiplying by positive scalar constant ‘c’
  2. Shift Adding a positive (or negative) scalar constant c
  3. Potential-based (change-in-state-based bonuses using psi function that leave things unchanged)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the danger of reward shaping?

A

Positive feedback loops. If

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Initializing Q-values randomly is okay? (True/False)

A

Dr. Isbell says False. (zero is a better choice in his opinion). This is the result of the fact that where we start (i.e. initialize) the Q-value does matter, because the agent is initially going to want to be in those states, which may slow down learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly