Dynamic Programming Flashcards
What is the idea behind dynamic programming.
Breaking a problem down into smaller subproblems and solving them recursivley
What properties should a prolbem have to be solvable by dynamic programming?
1) Optimal substructure.
The solution of the problem can be obtained by the solution combination of optimal solutions to subproblems
2) Overlapping subproblems
A algorithm should solve the “same” subproblem over and over rather than generating new subproblems
(e.g. Quicksort is not a dynamic algorithm, it is a divide and conquer algorithm).
What does the policy improvement theorem state?
The policy improvement theorem states that if
Q_pi(s,pi’_0(s)) >= V_pi(s). for deterministic policies pi and pi’ than pi’ is better than pi.
Basically greedely choosing the actions is always as good or better than the current policy.
What is policy iteration?
We can improve a policy pi to pi’ using V_pi. We can then improve pi’ to pi’’ using V_pi’…. This is called policy iteration.
State the basic outline of the policy iteration algorithm
1) Initialize
2) Do policy evaluation
3) Do policy improvement by:
pi’
Do we have to run the full policy evaulation in each step of the policy iteration algorithm?
No, we can use stoping conditions like: delta V(s) < epsilon or stop after k iterations
What is value iteration?
When we run the policy iteration and at each step we only run the policy iterations once. In this case we don’t need an explicit policy, always choose the maximal action in the policy evaluation step.
When can Dynamic Programing be used?
For medium sized problems, up to millions of states. Not usable for very large problems especially with continues state variables. (Control of a robot arm…)