Markov decision problems and Dynamic programming Flashcards
A transition has ___
The current state, current action, reward received and next state
We shouwld always leave all the possible and impossible actions for every state?
Yes, and if the action is impossible to do then the output will be doing nothing
If we know the present then the future is ___ of the past. This means that we don’t need the past to decide what will happen next
independent
To describe our problem we need ___
State space
Action Space
Reward function
Transition Probabilities
Its better to design the reward in a ___
abstract way
In practice when the agent is learning, we showld give ___ in the reward to serve as a ___ to the agent
hints
guide
Optimality criterion defines the criterion to select between ___
actions
Discount factor acts as an ___ rate
It makes the agent prefer rewards ___ than ___
inflation
sooner
latter
Policy is a mapping that maps everything the agent as ___ (___) to distributions over ___
seen so far
history
actions