Lecture 3 Flashcards
How to calculate the average costs per week when dealing with a Markov decision process?

What is a stationary policy?
Every time you are in state i you choose action a
What is the Markov property and Time Homogeneity?

What is the definition of discounted costs or discounted rewards?

When is a stationary policy optimal (in case of minimization)?

What follows for any stationary policy if the Markov property is satisfied? V.. = c(i,…)

How follows that the minimal discounted costs are finite?

What are the optimality equations?

What does solving the optimality equations look like? (Do not solve)

When is a stationary policy just as good? When is a stationary policy better?

What is the definition of an long term average cost per time user problem?

What is the definition for an optimal long term policy?

When is a policy optimal for discounted and long term costs?
