MDP/RL Flashcards

Question 1

Q

max expected utility MEU

Answer

A

max expected utility out of all actions

Question 2

Q

what is reinforcement learning

Answer

A

MDP where P and R are not given

Question 3

Q

what is exploration

Answer

A

allows an agent to improve its current knowledge about each action, hopefully leading to long-term benefit.
Improving the accuracy of the estimated action-values, enables an agent to make more informed decisions in the future.

Question 4

Q

what is exploitation

Answer

A

chooses the greedy action to get the most reward by exploiting the agent’s current action-value estimates.
may not actually get the most reward and lead to sub-optimal behaviour.

Question 5

Q

what is epsilon

Answer

A

Question 6

Q

upper confidence-bound action selection

Answer

A