Markov Decision Process Flashcards
Markov decision process
to optimize performance
decide which one of several possible action should be taken at that state. This affect the transition probabilities, the immediate costs and subsequent costs
-> each action in a state lead to a different probability distribution of where we end up
policy
a decision strategy, will change the probabilities and these can be used to evaluate the expected average cost per unit time which can indicate if it is a good policy (steady state system times the cost of each outcome)
(do,d1,…) de being decisions (actions) in each state
stationary and deterministic
expected cost
sum_i Cik pi_i (sum of the cost of each decision made in the policy times the corresponding steady state proba of being in that state )
it is the long run average which converges to the expected cost average
expected total discounted cost
alternative evaluation metric expected cost average.
discount factor alpha (between 0 and 1). impact of future payment on the decision we make.
Present value alpha 1 or m unit in the future
if given a Markov and alpha, the transition proba with alpha
the states are the cost to pay in that state
write equations from the diagram
gamma 1 = cost 1+alphaproba to 2gamma 2
gamma 2 = cost 2 + alphaproba to 1 gamma 1 + alphaproba to 2 gamma 2
then put all terms on same denominator to get the discounted costs
to linear program
coefficients are the cost for each function in each state yij are the decision variables (state i action 0)
Minimize amount of time on expensive actions
Min objective function: sum cik yik
subject to
1) all yij of same state i = sum off all yji that leads to that state by doing action a * the proba that state j does a
2) sum of all yij = 1
yij >= 0
yik is the fraction of time spent in state i doing k
OR IF DISCOUNT FACTOR
Max Objective function: sum cik yik
subject to
1) all yij of same state = 1/(n states) + alpha * sum off all yji that leads to that state by doing action a * the proba that state j does a
yij >= 0
yik is the expected number of times action k is done in state i
Solution is strategy Dik = yik/ sum k yik
Markov Diagram & average cost
if given a Markov and alpha, the transition proba with alpha
the states are the cost to pay in that state
having pi then average cost = sum_allstates pi_i*cost_i
number of visits to states
D is the matrix with the discounting factor alpha * proba for initial state i
(I-D^T)^-1
We can get the discounted costs per state !!!!
eg: to get gamma 1 take the values of 1 in (I-D^T)^-1
sum them and weight them to the cost of the state