Chapter 7: Dynamic Choice Flashcards

1
Q

What are the components to a dynamic programming problem?

A

Actions, histories, strategies, law of motion, the objective function, and the constraint.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define a strategy

A

Maps a history into an action or a distribution across actions. Each strategy and history pair then generate a distribution measure over future histories,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Bellman Equation

A

optimal value function today is the supremum of the expected optimal value function tomorrow given tomorrow’s history.
Note the emphasis on the optimal value function, not the function given the strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Conserving strategy

A

optimal value function equals the optimal value function of the next period, under the expectation of the strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Unimprovable strategy

A

value function given strategy today equals the optimal value function tomorrow given the current strategy is played through.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Conserving vs Unimprovable

A

A conserving strategy does not prevent the optimal value function from no longer being the optimal value function.
An unimprovable strategy is a strategy that generates a better outcome than all other strategies for the next period, supposing the strategy is played in the next period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If a strategy is optimal, then it is

A

both conserving and unimprovable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In any finite horizon problem,

A

every conserving and unimprovable strategy is optimal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Upper convergent

A

The limit of the upper bar utility function equals u(h) as t goes to infinity.
Upper bar u is given by the supremum of all histories, given that the history before time t is held constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Lower convergent

A

The limit of the lower bar utility function equals u(h) as t goes to infinity.
Lower bar u is given by the infimum of all histories, given that the history before time t is held constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is a conserving strategy optimal

A

if it is upper convergent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When is a unimprovable strategy optimal

A

if it is lower convergent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Example of conserving strategy

A

Always waiting and never taking an offer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Example of unimprovable strategy

A

Kicking the can down the road large M number of times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Properties of additive rewards (3)

A
  1. If all rewards are positive, utility is lower convergent so every unimprovable strategy is optimal.
  2. If all rewards are negative, utility is upper convergent so every conserving strategy is optimal.
  3. If functions are uniformly bounded, delta < 1, every unimprovable strategy and conserving strategy is optimal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Typical Approach to Dynamic Programming problems…

A

Show that a strategy is unimprovable and lower convergent. Why? Since we don’t know what f^* actually looks like.

For discounted additive utilities, don’t need lower convergence!

17
Q

Transience

A

Something is (upper) transient if there is a subset of histories that happen almost surely for a strategy, have bounded utility, and for which the utility is (upper) convergent.

18
Q

Use of transience

A

If a strategy is unimprovable and lower transient, then it is optimal.
If a strategy is conserving and upper transient, then it is optimal.

19
Q

Why transience?

A

Allows us to restrict our attention to relevant histories rather than proving for all histories.

20
Q

Value iteration results (2)

A
  1. Iterating on the value function of a strategy converges to the infinite time-horizon value for said strategy.
  2. Iterating on the optimal value function (assuming discounting and bounded rewards) yields the infinite time-horizon optimal value.

If we can find the optimal strategy via unimprovability, the second bullet is less important.