Reinforcement Learning Flashcards

Question 1

Q

Trading as an RL problem.

For the following, list whether they are state (s), action (a), or reward (r):

Answer

A

Question 2

Q

What is the update rule? What is alpha and gamma?

Answer

A

new Q[s,a] = (1-alpha) * old_value + alpha * new_best_estimate

alpha is the learning rate

gamma is the discount rate

Question 3

Q

Which results in faster convergence?

Answer

A

r = daily return

A reward at each step allows the learning agent to get feedback on each individual action it takes (including doing nothing).

Question 4

Q

Which of the following could/should be a state?

Answer

A

Everything except the first two (Adjusted Close, SMA).

State is either a technical indicator (something that gives a signal) or a Holding status (eg holding or not, holding value)

Question 5

Q

What are the advantages of a model-free approach like Q-Learning?

Answer

A

You don’t need to know (and have data structures for) the Transition function and the Reward function.

Also, the Q-Value accounts for future rewards.

Question 6

Q

Using Dyna, how do you calculate the probability for T?

T[s, a, s_prime] = ?

Answer

A

T[s, a, s_prime] = T_count[s,a,s_prime] / sum(T[s,a]

of times a transition occurs divided by # of s & a occur

(6 cards)