Algorithms - Equations Flashcards
1
Q
V(s) = max_a ( R(s,a) + γΣ_S1 T(s,a,s1) V(s1))
A
Bellman Equation
The value of a state equals the max over all the actions - the reward you get for taking that action in that state plus the discounted value of state you end up in weighted by the probability you end up there.
2
Q
Explain the belman equation.
A
The value or utility of a state is based on the discounted reward of future rewards from state, action, state prime until terminal.