MP and MRP Flashcards

1
Q

What is the formal definition of a Markov process?

A

A Markov process is a tuple (S, Pss’)

1) S, state space
2) P_ss’, state transition probability matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

On what does the probability of the next state, S_t+1 depend?

A

Only the current state s_t. It is independent of former states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the formal definition of a Markov Reward Process?

A

A Markov Reward Process is a tuple (S, P, R, gamma):

1) S, state space
2) Pss’, state transition probability matrix
3) Rs, the immediate reward for leaving state s.
4) gamma, the discount factor, usually in the range [0, 1]

Notice that R is the “fancy” R. Normal R is usually used for the Return, and should not be confused by the immediate reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is the discount factor usually smaller than 1?

A

1) Mathematical convenient, (will converge)
2) Avoids infinite loops
3) Might be the truth for some cases, like financial rewards
4) Models uncertainty of future rewards
5) Research into human and animal decision making shows preference of imidate rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the idea of the Belman equation

A

Rewards can be decomposed into immediate and future rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the most important forms of the Belman equation for MRP’s

A
1.Expectation:
v(s) = E[R_t |S_t = s] (R is the return)
2.Sum:
v(s) = r_t + gamma* sum_s'[P_ss' * v(s')
Vector:
v = R(Imidiate) + gamma*P_ss'*v
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define the Return

A

R_t = r_t+1 + gammar_t+2 + gamma**2r_t+3….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define the state-value function

A

v(s) = E[R_t |S_t = s], R is the return

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the direct solution of the Belman equation for MRP’s using vector form?

A

v = (I - gammaP_ss’)^(-1)R(imidiate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why can’t we always use the direct solution of the Belman equation?

A

Calculating the matrix inverse might be to computational expensive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What 3 main iterative methods do we have for solving the Belman equation?

A

1) Monte-Carlo evaluation
2) Dynamic Programming
3) Temporal-difference learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens when gamma, the discount factor, is close to 0 and 1

A

Close to 0 leads to a “short-sighted” evaluation while close to 1 leads to a “far-sighted” evaluation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly