Dynamic Programming Flashcards

1
Q

What is the idea behind dynamic programming.

A

Breaking a problem down into smaller subproblems and solving them recursivley

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What properties should a prolbem have to be solvable by dynamic programming?

A

1) Optimal substructure.
The solution of the problem can be obtained by the solution combination of optimal solutions to subproblems

2) Overlapping subproblems
A algorithm should solve the “same” subproblem over and over rather than generating new subproblems
(e.g. Quicksort is not a dynamic algorithm, it is a divide and conquer algorithm).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the policy improvement theorem state?

A

The policy improvement theorem states that if
Q_pi(s,pi’_0(s)) >= V_pi(s). for deterministic policies pi and pi’ than pi’ is better than pi.

Basically greedely choosing the actions is always as good or better than the current policy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is policy iteration?

A

We can improve a policy pi to pi’ using V_pi. We can then improve pi’ to pi’’ using V_pi’…. This is called policy iteration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

State the basic outline of the policy iteration algorithm

A

1) Initialize
2) Do policy evaluation
3) Do policy improvement by:
pi’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Do we have to run the full policy evaulation in each step of the policy iteration algorithm?

A
No, we can use stoping conditions like:
delta V(s) < epsilon or stop after k iterations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is value iteration?

A

When we run the policy iteration and at each step we only run the policy iterations once. In this case we don’t need an explicit policy, always choose the maximal action in the policy evaluation step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When can Dynamic Programing be used?

A

For medium sized problems, up to millions of states. Not usable for very large problems especially with continues state variables. (Control of a robot arm…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly