MDP/RL Flashcards
1
Q
max expected utility MEU
A
max expected utility out of all actions
2
Q
what is reinforcement learning
A
MDP where P and R are not given
3
Q
what is exploration
A
- allows an agent to improve its current knowledge about each action, hopefully leading to long-term benefit.
- Improving the accuracy of the estimated action-values, enables an agent to make more informed decisions in the future.
4
Q
what is exploitation
A
- chooses the greedy action to get the most reward by exploiting the agent’s current action-value estimates.
- may not actually get the most reward and lead to sub-optimal behaviour.
5
Q
what is epsilon
A
- probability of choosing to explore
- low value: exploit action with highest expected rewards
- high value: explore new actions even with low rewards
6
Q
upper confidence-bound action selection
A
- optimism in the face of uncertainty
- a*t