Markov Decision Process Flashcards

1
Q

Markov decision process

A

to optimize performance
decide which one of several possible action should be taken at that state. This affect the transition probabilities, the immediate costs and subsequent costs
-> each action in a state lead to a different probability distribution of where we end up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

policy

A

a decision strategy, will change the probabilities and these can be used to evaluate the expected average cost per unit time which can indicate if it is a good policy (steady state system times the cost of each outcome)
(do,d1,…) de being decisions (actions) in each state
stationary and deterministic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

expected cost

A

sum_i Cik pi_i (sum of the cost of each decision made in the policy times the corresponding steady state proba of being in that state )
it is the long run average which converges to the expected cost average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

expected total discounted cost

A

alternative evaluation metric expected cost average.
discount factor alpha (between 0 and 1). impact of future payment on the decision we make.
Present value alpha 1 or m unit in the future

if given a Markov and alpha, the transition proba with alpha
the states are the cost to pay in that state

write equations from the diagram

gamma 1 = cost 1+alphaproba to 2gamma 2
gamma 2 = cost 2 + alphaproba to 1 gamma 1 + alphaproba to 2 gamma 2

then put all terms on same denominator to get the discounted costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

to linear program

A

coefficients are the cost for each function in each state yij are the decision variables (state i action 0)

Minimize amount of time on expensive actions
Min objective function: sum cik yik
subject to
1) all yij of same state i = sum off all yji that leads to that state by doing action a * the proba that state j does a
2) sum of all yij = 1
yij >= 0
yik is the fraction of time spent in state i doing k

OR IF DISCOUNT FACTOR
Max Objective function: sum cik yik
subject to
1) all yij of same state = 1/(n states) + alpha * sum off all yji that leads to that state by doing action a * the proba that state j does a
yij >= 0
yik is the expected number of times action k is done in state i

Solution is strategy Dik = yik/ sum k yik

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Markov Diagram & average cost

A

if given a Markov and alpha, the transition proba with alpha
the states are the cost to pay in that state

having pi then average cost = sum_allstates pi_i*cost_i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

number of visits to states

A

D is the matrix with the discounting factor alpha * proba for initial state i
(I-D^T)^-1

We can get the discounted costs per state !!!!
eg: to get gamma 1 take the values of 1 in (I-D^T)^-1
sum them and weight them to the cost of the state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly