CS7642_Week6 Flashcards

Question 1

Q

Why is generalization/function approximation important in the context of RL?

Answer

A

Because in the real world, state spaces become huge and tabular methods break down.

Question 2

Q

What are the three types of functions we could approximate in an RL context?

Answer

A

Value Functions (tends to be most well studied and successful)
Policies (Some empirical/anecdotal evidence that policy based function approximation works better than other methods in robotics domain)
Models

Question 3

Q

What is a “convex combination”? What is a “convex hull”?

Answer

A

(This is from outside the class, but useful to know). A convex combination is a linear combination of points where all coefficients are non-negative and sum to 1. T

Convex hull: think about a set of dowels on a pegboard. Tie a rope to the peg at the bottom, and in a counter-clockwise motion sweep it around all the other pegs. The set of pegs that form the boundary (any any pegs inside the roped region would define the convex hull of this 2D space).

Question 4

Q

We can use “averagers” (some convex combination of new basis functions) to define a new MDP that we can use as a function approximator? (True/False)

Answer

A

True. The convex combination behaves more or less like an MDP itself (x1, x2… xn >= 0, x1 + x2 … + xn = 1), so we have good techniques for solving these.

Question 5

Q

What is TD error?

Answer

A

It’s the difference between the target of the update and our current estimate of the state value or state-action value, where the target is simply the “dose of reality” we receive (i.e. the immediate reward) plus the estimate of the discounted value of our future rewards. It’s encapsulated in the Bellman equation.

Question 6

Q

Linear function approximators will always work and are guaranteed to converge? (True/False)

Answer

A

False. Baird’s counterexample is one example of why this is the case.

Question 7

Q

When using averagers (kernel methods) as function approximators, as the number of anchor points increases the error in the value function estimate goes down?

Answer

A

True. As a result, averagers actually guarantee convergence over time, and not only converge on an answer, but the right answer (i.e. the correct value function) in the limit.

Question 8

Q

When used as function approximators, averagers alwas converge to the optimal value function?

Answer

A

True. Averagers do converge and converge to the right answer (in the limit).

CS7642_Week6 Flashcards

(8 cards)