q-learning intuition Flashcards

1
Q

V(s) = max(R(s,a) + yV(sā€™))

A
s = state
s'=  following state, ending state
max = many actions
a = action
R = reward 
y(gamma) 
take one action in state s we will automatically get a reward ...  R(s,a)
value of new state yV(s')
for every action (max) we have this calculation (R(s,a) + yV(s'))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The bellman equation creates ____________ for an agent to get to ___________

A

incentive

reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.

A

deterministic algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Factors that cause a non-deterministic search

A

A variety of factors can cause an algorithm to behave in a way which is not deterministic, or non-deterministic:

If it uses external state other than the input, such as user input, a global variable, a hardware timer value, a random value, or stored disk data.
If it operates in a way that is timing-sensitive, for example if it has multiple processors writing to the same data at the same time. In this case, the precise order in which each processor writes its data will affect the result.
If a hardware error causes its state to change in an unexpected way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the dynamic programming equation/Bellman equation

A

It writes the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Markov decision processes (MDPs)

A

Markov decision processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. Depends upon the present state, not on the sequence of events that preceded it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly