Chapter 7: Dynamic Choice Flashcards

Question 1

Q

What are the components to a dynamic programming problem?

Answer

A

Actions, histories, strategies, law of motion, the objective function, and the constraint.

Question 2

Q

Define a strategy

Answer

A

Maps a history into an action or a distribution across actions. Each strategy and history pair then generate a distribution measure over future histories,

Question 3

Q

Bellman Equation

Answer

A

optimal value function today is the supremum of the expected optimal value function tomorrow given tomorrow’s history.
Note the emphasis on the optimal value function, not the function given the strategy.

Question 4

Q

Conserving strategy

Answer

A

optimal value function equals the optimal value function of the next period, under the expectation of the strategy.

Question 5

Q

Unimprovable strategy

Answer

A

value function given strategy today equals the optimal value function tomorrow given the current strategy is played through.

Question 6

Q

Conserving vs Unimprovable

Answer

A

A conserving strategy does not prevent the optimal value function from no longer being the optimal value function.
An unimprovable strategy is a strategy that generates a better outcome than all other strategies for the next period, supposing the strategy is played in the next period.

Question 7

Q

If a strategy is optimal, then it is

Answer

A

both conserving and unimprovable.

Question 8

Q

In any finite horizon problem,

Answer

A

every conserving and unimprovable strategy is optimal.

Question 9

Q

Upper convergent

Answer

A

The limit of the upper bar utility function equals u(h) as t goes to infinity.
Upper bar u is given by the supremum of all histories, given that the history before time t is held constant.

Question 10

Q

Lower convergent

Answer

A

The limit of the lower bar utility function equals u(h) as t goes to infinity.
Lower bar u is given by the infimum of all histories, given that the history before time t is held constant.

Question 11

Q

When is a conserving strategy optimal

Answer

A

if it is upper convergent.

Question 12

Q

When is a unimprovable strategy optimal

Answer

A

if it is lower convergent.

Question 13

Q

Example of conserving strategy

Answer

A

Always waiting and never taking an offer

Question 14

Q

Example of unimprovable strategy

Answer

A

Kicking the can down the road large M number of times.

Question 15

Q

Properties of additive rewards (3)

Answer

A

If all rewards are positive, utility is lower convergent so every unimprovable strategy is optimal.
If all rewards are negative, utility is upper convergent so every conserving strategy is optimal.
If functions are uniformly bounded, delta < 1, every unimprovable strategy and conserving strategy is optimal.

Question 16

Q

Typical Approach to Dynamic Programming problems…

Answer

A

Show that a strategy is unimprovable and lower convergent. Why? Since we don’t know what f^* actually looks like.

For discounted additive utilities, don’t need lower convergence!

Question 17

Q

Transience

Answer

A

Something is (upper) transient if there is a subset of histories that happen almost surely for a strategy, have bounded utility, and for which the utility is (upper) convergent.

Question 18

Q

Use of transience

Answer

A

If a strategy is unimprovable and lower transient, then it is optimal.
If a strategy is conserving and upper transient, then it is optimal.

Question 19

Q

Why transience?

Answer

A

Allows us to restrict our attention to relevant histories rather than proving for all histories.

Question 20

Q

Value iteration results (2)

Answer

A

Iterating on the value function of a strategy converges to the infinite time-horizon value for said strategy.
Iterating on the optimal value function (assuming discounting and bounded rewards) yields the infinite time-horizon optimal value.

If we can find the optimal strategy via unimprovability, the second bullet is less important.