Reinforcement Learning Flashcards
Trading as an RL problem.
For the following, list whether they are state (s), action (a), or reward (r):
- BUY
- SELL
- Holding Long
- Bollinger Value
- return from trade
- daily return
- BUY (action)
- SELL (action)
- Holding Long (state)
- Bollinger Value (state)
- return from trade (reward)
- daily return (reward, state)
What is the update rule? What is alpha and gamma?
new Q[s,a] = (1-alpha) * old_value + alpha * new_best_estimate
alpha is the learning rate
- range is 0-1.0, usually use 0.2
- larger alpha = faster learning rate (more randomness)
- smaller alpha = slower learning rate
gamma is the discount rate
- range is 0-1.0
- low gamma = we value immediate rewards more
- high gamma = we value later rewards more
Which results in faster convergence?
- r = daily_return
- r = 0 until exit, then cum_ret
r = daily return
A reward at each step allows the learning agent to get feedback on each individual action it takes (including doing nothing).
Which of the following could/should be a state?
- Adjusted close
- SMA
- Adjusted close/SMA
- Bollinger Band Value
- P/E Ratio
- Holding Stock
- Return since entry
Everything except the first two (Adjusted Close, SMA).
State is either a technical indicator (something that gives a signal) or a Holding status (eg holding or not, holding value)
What are the advantages of a model-free approach like Q-Learning?
You don’t need to know (and have data structures for) the Transition function and the Reward function.
Also, the Q-Value accounts for future rewards.
Using Dyna, how do you calculate the probability for T?
T[s, a, s_prime] = ?
T[s, a, s_prime] = T_count[s,a,s_prime] / sum(T[s,a]
of times a transition occurs divided by # of s & a occur