Chapter 7: Dynamic Choice Flashcards
What are the components to a dynamic programming problem?
Actions, histories, strategies, law of motion, the objective function, and the constraint.
Define a strategy
Maps a history into an action or a distribution across actions. Each strategy and history pair then generate a distribution measure over future histories,
Bellman Equation
optimal value function today is the supremum of the expected optimal value function tomorrow given tomorrow’s history.
Note the emphasis on the optimal value function, not the function given the strategy.
Conserving strategy
optimal value function equals the optimal value function of the next period, under the expectation of the strategy.
Unimprovable strategy
value function given strategy today equals the optimal value function tomorrow given the current strategy is played through.
Conserving vs Unimprovable
A conserving strategy does not prevent the optimal value function from no longer being the optimal value function.
An unimprovable strategy is a strategy that generates a better outcome than all other strategies for the next period, supposing the strategy is played in the next period.
If a strategy is optimal, then it is
both conserving and unimprovable.
In any finite horizon problem,
every conserving and unimprovable strategy is optimal.
Upper convergent
The limit of the upper bar utility function equals u(h) as t goes to infinity.
Upper bar u is given by the supremum of all histories, given that the history before time t is held constant.
Lower convergent
The limit of the lower bar utility function equals u(h) as t goes to infinity.
Lower bar u is given by the infimum of all histories, given that the history before time t is held constant.
When is a conserving strategy optimal
if it is upper convergent.
When is a unimprovable strategy optimal
if it is lower convergent.
Example of conserving strategy
Always waiting and never taking an offer
Example of unimprovable strategy
Kicking the can down the road large M number of times.
Properties of additive rewards (3)
- If all rewards are positive, utility is lower convergent so every unimprovable strategy is optimal.
- If all rewards are negative, utility is upper convergent so every conserving strategy is optimal.
- If functions are uniformly bounded, delta < 1, every unimprovable strategy and conserving strategy is optimal.