Reinforcement Learning Flashcards

1
Q

Trading as an RL problem.

For the following, list whether they are state (s), action (a), or reward (r):

  • BUY
  • SELL
  • Holding Long
  • Bollinger Value
  • return from trade
  • daily return
A
  • BUY (action)
  • SELL (action)
  • Holding Long (state)
  • Bollinger Value (state)
  • return from trade (reward)
  • daily return (reward, state)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the update rule? What is alpha and gamma?

A

new Q[s,a] = (1-alpha) * old_value + alpha * new_best_estimate

alpha is the learning rate

  • range is 0-1.0, usually use 0.2
  • larger alpha = faster learning rate (more randomness)
  • smaller alpha = slower learning rate

gamma is the discount rate

  • range is 0-1.0
  • low gamma = we value immediate rewards more
  • high gamma = we value later rewards more
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which results in faster convergence?

  • r = daily_return
  • r = 0 until exit, then cum_ret
A

r = daily return

A reward at each step allows the learning agent to get feedback on each individual action it takes (including doing nothing).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following could/should be a state?

  • Adjusted close
  • SMA
  • Adjusted close/SMA
  • Bollinger Band Value
  • P/E Ratio
  • Holding Stock
  • Return since entry
A

Everything except the first two (Adjusted Close, SMA).

State is either a technical indicator (something that gives a signal) or a Holding status (eg holding or not, holding value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the advantages of a model-free approach like Q-Learning?

A

You don’t need to know (and have data structures for) the Transition function and the Reward function.

Also, the Q-Value accounts for future rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Using Dyna, how do you calculate the probability for T?

T[s, a, s_prime] = ?

A

T[s, a, s_prime] = T_count[s,a,s_prime] / sum(T[s,a]

of times a transition occurs divided by # of s & a occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly