Algorithms - Equations Flashcards

1
Q

V(s) = max_a ( R(s,a) + γΣ_S1 T(s,a,s1) V(s1))

A

Bellman Equation

The value of a state equals the max over all the actions - the reward you get for taking that action in that state plus the discounted value of state you end up in weighted by the probability you end up there.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the belman equation.

A

The value or utility of a state is based on the discounted reward of future rewards from state, action, state prime until terminal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly