Function approximation Flashcards

Question 1

Q

What properties do we want in a function approximation method for V and Q?

Answer

A

1) It should generalize to unseen states (actions)

2) We can update the function approximation using MC or TD

Question 2

Q

Why would we want to use function approximation?

Answer

A

There might be to many states/ actions to store in a look-up table or our agent while take to long exploring all combinations

Question 3

Q

Name some possible function approximations

Answer

A

Linear combinations, NN, Decision trees, Nearest Neighbour, Fourier/ Wavelet bases.

Question 4

Q

What s the goal of function approximation?

Answer

A

Minimizing the error (MSE) between the true Value function and the estimated.

Question 5

Q

What method do we usually use to improve the function approximation?

Answer

A

Gradient descent

Question 6

Q

What is coarse coading in feature vectors

Answer

A

Different features overlapp. e.g. the location can be represented binary by overlapping circles. If the agent is inside a circle, the corresponding feature is 1, else 0.

Question 7

Q

What is tiling?

Answer

A

The features are grouped into exhaustive partitions of the input space. Each partition is a tile.

Question 8

Q

What is RBF

Answer

A

Radial basis function are guassian functions. We can use them to split up the input space into continuoues (0, 1) features.

Question 9

Q

What is experience replay and why is it used.

Answer

A

Experience replay stores traces and updates the network as mini-batches. This helps against overfitting towards 1 point. (TD learning bias)

Question 10

Q

What is a target network?

Answer

A

A target network is added in addition to the behaviour network, the target network is only updated sparsley (every 1000 iteration or so…).

Question 11

Q

Why do we clip rewards in atari learning?

Answer

A

Simplifying reward space so the reward magnitude doesn’t affect the network too much.

Question 12

Q

Why do we skip frames in atri learning?

Answer

A

The games not made to be played as fast as a network can react. This also reduces computational cost and reduces training time.