Markov decision problems and Dynamic programming Flashcards

Question 1

Q

A transition has ___

Answer

A

The current state, current action, reward received and next state

Question 2

Q

We shouwld always leave all the possible and impossible actions for every state?

Answer

A

Yes, and if the action is impossible to do then the output will be doing nothing

Question 3

Q

If we know the present then the future is ___ of the past. This means that we don’t need the past to decide what will happen next

Answer

A

independent

Question 4

Q

To describe our problem we need ___

Answer

A

State space
Action Space
Reward function
Transition Probabilities

Question 5

Q

Its better to design the reward in a ___

Answer

A

abstract way

Question 6

Q

In practice when the agent is learning, we showld give ___ in the reward to serve as a ___ to the agent

Answer

A

hints

guide

Question 7

Q

Optimality criterion defines the criterion to select between ___

Question 8

Q

Discount factor acts as an ___ rate

It makes the agent prefer rewards ___ than ___

Answer

A

inflation
sooner
latter

Question 9

Q

Policy is a mapping that maps everything the agent as ___ (___) to distributions over ___

Answer

A

seen so far
history
actions