Lecture 3 Flashcards
How to calculate the average costs per week when dealing with a Markov decision process?
What is a stationary policy?
Every time you are in state i you choose action a
What is the Markov property and Time Homogeneity?
What is the definition of discounted costs or discounted rewards?
When is a stationary policy optimal (in case of minimization)?
What follows for any stationary policy if the Markov property is satisfied? V.. = c(i,…)
How follows that the minimal discounted costs are finite?
What are the optimality equations?
What does solving the optimality equations look like? (Do not solve)
When is a stationary policy just as good? When is a stationary policy better?
What is the definition of an long term average cost per time user problem?
What is the definition for an optimal long term policy?
When is a policy optimal for discounted and long term costs?