Reinforcement Learning Flashcards
Define reward in RL
The numerical signal (scalar value) that implicitly expresses the agent goal by encour- aging/punishing goal-directed/unwanted state transitions. (2) C06 S 25
Define action-value function in RL
The action-value function q describes the expected cumulative and discounted reward following a specific policy when selecting a specific action in a particular state. (2) C06 S 37
Define approximate RL
The agent predicts values (or actions in the case of policy gradient) with the help of non-linear function approximators (like neural networks) that generalize on states. (2) C06 S 61
Define the state-value function in RL
The state-value function is the expected return when a specific policy is followed after
visiting a particular state:
Define the action-value function in RL
The action-value function q is the expected return when a specific policy is followed after choosing an action in a particular state.
What is the reward hypothesis in RL
That all of what we mean by goals and purposes can be well thought of as the maximization
of the expected value of the cumulative sum of a received scalar signal (called reward)