Convergence Flashcards

Question 1

Q

What is the difference between RL with control and without?

Answer

A

RL with control involves the agent selecting actions

Question 2

Q

What is the Bellman equation with actions? How does this relate to a Bellman operator?

Answer

A

Q(s,a) = R(s,a) + gamma*Sum_s'[T(s,a,s') max_a'(Q(s',a'))]
[BQ](s,a) = the above form

Question 3

Q

What does the Bellman operator Q* = BQ* describe? Q_t = BQ_t-1?

Answer

A

Q* = BQ* = Bellman equation
Q_t = BQ_t-1 = Value iteration

Question 4

Q

What is a contraction mapping?

Answer

A

B is a operator (maps value functions to value functions)
If, for all Q functions F,G and some 0 <= gamma < 1
|| BF-BG || _inf <= gamma*|| F-G ||_inf
then B is a contraction mapping

Note ||Q||inf = max(s,a) |Q(s,a)|

Question 5

Q

What are the properties of a contraction mapping?

Answer

A

If B is a contraction mapping:

F* = BF* has a unique solution
F_t = BF_t-1 => F_t -> F* (i.e. value iteration converges)

Question 6

Q

T/F: Is the max operator a non-expansion?

Answer

A

True
Assert WLOG - max_a f(a) >= g(a)
|max_a f(a) - max_a g(a)| = max_a f(a) -max_a g(a) = f(a1) - g(a2) (for some a1, a2)
<= f(a1) - g(a1) (by our assertion)
= |f(a1) - g(a1)| <= max_a |f(a) - g(a)|

Question 7

Q

How can we generalize MDPs? Do they converge? If so, to what?

Answer

A

Using the generalized Bellman Equation we can change the “sum” and “multiplication” operator.
They will converge as long as they are non-expanding.
They converge to some fixed point based on the definition of the model, not necessarily Q*
e.g. we can minimize the outcome of the maximizing action to develop a risk averse MDP model

Question 8

Q

What are some examples of non-expansion operators?

Answer

A

Order statistics: max, min, etc

Fixed convex combination

Convergence Flashcards

(8 cards)