Algorithms Flashcards
1
Q
SARSA vs Q
A
2
Q
SARSA Algorithm
A
3
Q
REINFORCE Algorithm
A
4
Q
Q-learning update rule
A
5
Q
SARSA update rule
A
6
Q
Boltzmann vs Softmax policy
A
7
Q
Boltzmann policy
High values of τ (e.g., τ = 5) move the probability distribution closer to a uniform distribution. This results in an agent acting very randomly. Low values of τ (e.g., 0.1) increase the probability of the action corresponding to the largest Q-value, so the agent will act more greedily. τ = 1 reduces to the softmax function
A
8
Q
DQN Algorithm
A