8 - Value Function Approximation Flashcards
List a couple of advantages of Value Function Approximation methods?
One advantage is that we don’t have to explicitly store or learn for every single state a reward model, value, state-action value, or policy. It allows us a more compact representation that can generalize across states and actions, thus reducing memory, computation, and experience needs. (We assume this is reasonable if we have a “smooth structure”).
Neural Networks / Deep Learning is the most popular differentiable function approximator. What is the second most popular differentiable function approximator?
Linear Feature Representations (ie - linear combinations of features)
Gradient descent is guaranteed to find a _________ {local, global} optima.
Gradient descent is guaranteed to find a local optima.
What does the linear part of Linear Value Function Approximation mean?
This means that we represent a value function (or state-action value function) for a particular policy using a weighted linear combination of features.