DP bank Flashcards
elaborate on dynamic programming
Assume we have T stages. In each stage, we need to make a decision on exactly one variable. The decision we make at this stage will depend on the state we are in at this stage. For instance, in the knapsack problem: Since the state is defined by the remaining capacity, if the remaining capacity is 0, our deicison will obviously be to not pick any more.
The decision we make at a stage should be formualted as a transition function. We need to relate some state+decision combination into a new future state.
We use T_t(s_t, x_t) to represent the transition function. It should use the current state and the decision to make a new state that we consider for the next stage.
The objective function value is considered as the cost of being in a stage. It will depend on the state we are at, the decision we have made.
over the course of all the stages in our problem, we compute the TOTAL COST like this:
∑c_t(s_t, x_t) [t=1, T]
The cost terms are the cost belonging to “being in state t”. This is a CRUCIAL part to understand. The cost of being in a state will decrease as we go closer to the final state because there are less actions/decisions to make. Since decisions are associated with a cost, if we have to do less of them, we generally say that the overall cost is lower.
Therefore, the sum ∑c_t(s_t, x_t) [t=1, T] is better represetned as a recursive function.
What we do know is that the total cost of being in a stage is given by:
f_t(s_t) = min ∑c_i(s_i, x_i) [i=t, T]
In other other words, the cost of being in stage t, and state s_t, is given by the set of actions for each stage that produce the lowest total cost. Emphasis on the fact that we are considering all remaining stages.
This function is better described like recursion:
f_t(s_t) = min[x_t] [c_t(s_t, x_t) + f_(t+1)(s_(t+1))]
In order to start the recursion in stage T, we need initial conditions. For instance, in the knapsack problem the initial condition is 0, indicating that the final remaining value of the case should be 0. I suppose an equivalent way of looking at it is to select 10, representing that the backpack size should be 10 at the final stage.
We also need to set the initial state in stage 1. In knapsack, this would be the opposite of the stage T condition.
elaborate on the DP tableaus
We have the following columns:
[xt/st, state 0, 1, …, T]
we have the following rows:
[xt/st, 0, 1, …., T, f_t(s_t), xt^star(st)]
the columns are self explainatory
The rows are kind of as well, but the f_t(s_t) holds the best objective function value from the differnet actions given a certain previous state.
xt^star(st) just specify the best action/decision, which is the one that correspond to the best objective function value.
Each cell in the main part of the tableau holds the objective function value (recursion included naturally) of the state+cost decision. Could be state+profit, depends on the problem.
elaborate on the final stage
I believe the final stage is typically used as a “we are finsihed” sort of thing, and does not actually have a decision. Rather, we have used our final decision in the stage before, which results in going to this final state-stage.
what stage is the last stage that gets a tableau
Stage T-1