Function approximation Flashcards

1
Q

What properties do we want in a function approximation method for V and Q?

A

1) It should generalize to unseen states (actions)

2) We can update the function approximation using MC or TD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why would we want to use function approximation?

A

There might be to many states/ actions to store in a look-up table or our agent while take to long exploring all combinations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name some possible function approximations

A

Linear combinations, NN, Decision trees, Nearest Neighbour, Fourier/ Wavelet bases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What s the goal of function approximation?

A

Minimizing the error (MSE) between the true Value function and the estimated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What method do we usually use to improve the function approximation?

A

Gradient descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is coarse coading in feature vectors

A

Different features overlapp. e.g. the location can be represented binary by overlapping circles. If the agent is inside a circle, the corresponding feature is 1, else 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is tiling?

A

The features are grouped into exhaustive partitions of the input space. Each partition is a tile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is RBF

A

Radial basis function are guassian functions. We can use them to split up the input space into continuoues (0, 1) features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is experience replay and why is it used.

A

Experience replay stores traces and updates the network as mini-batches. This helps against overfitting towards 1 point. (TD learning bias)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a target network?

A

A target network is added in addition to the behaviour network, the target network is only updated sparsley (every 1000 iteration or so…).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why do we clip rewards in atari learning?

A

Simplifying reward space so the reward magnitude doesn’t affect the network too much.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why do we skip frames in atri learning?

A

The games not made to be played as fast as a network can react. This also reduces computational cost and reduces training time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly