Function approximation Flashcards
What properties do we want in a function approximation method for V and Q?
1) It should generalize to unseen states (actions)
2) We can update the function approximation using MC or TD
Why would we want to use function approximation?
There might be to many states/ actions to store in a look-up table or our agent while take to long exploring all combinations
Name some possible function approximations
Linear combinations, NN, Decision trees, Nearest Neighbour, Fourier/ Wavelet bases.
What s the goal of function approximation?
Minimizing the error (MSE) between the true Value function and the estimated.
What method do we usually use to improve the function approximation?
Gradient descent
What is coarse coading in feature vectors
Different features overlapp. e.g. the location can be represented binary by overlapping circles. If the agent is inside a circle, the corresponding feature is 1, else 0.
What is tiling?
The features are grouped into exhaustive partitions of the input space. Each partition is a tile.
What is RBF
Radial basis function are guassian functions. We can use them to split up the input space into continuoues (0, 1) features.
What is experience replay and why is it used.
Experience replay stores traces and updates the network as mini-batches. This helps against overfitting towards 1 point. (TD learning bias)
What is a target network?
A target network is added in addition to the behaviour network, the target network is only updated sparsley (every 1000 iteration or so…).
Why do we clip rewards in atari learning?
Simplifying reward space so the reward magnitude doesn’t affect the network too much.
Why do we skip frames in atri learning?
The games not made to be played as fast as a network can react. This also reduces computational cost and reduces training time.