MDP/RL Flashcards

1
Q

max expected utility MEU

A

max expected utility out of all actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is reinforcement learning

A

MDP where P and R are not given

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is exploration

A
  • allows an agent to improve its current knowledge about each action, hopefully leading to long-term benefit.
  • Improving the accuracy of the estimated action-values, enables an agent to make more informed decisions in the future.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is exploitation

A
  • chooses the greedy action to get the most reward by exploiting the agent’s current action-value estimates.
  • may not actually get the most reward and lead to sub-optimal behaviour.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is epsilon

A
  • probability of choosing to explore
  • low value: exploit action with highest expected rewards
  • high value: explore new actions even with low rewards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

upper confidence-bound action selection

A
  • optimism in the face of uncertainty
  • a*t
How well did you know this?
1
Not at all
2
3
4
5
Perfectly