Exam Questions Flashcards
Vergeet dus niet dat je hier twee verschillende initialisaties hebt. Daarom moet je ook het aantal goed beantwoorde vragen toevoegen.
Wat zou je hier moeten doen?
Probeer gewoon hier een mogelijk antwoord, itereer door als nodig.
How to solve for which ɑ one type of strategies is better then another one?
Solve x1, x2 and y1, y2, and then set x1 ≤ y1 and x2 ≤ y2
How to solve using iteration?
Solve the current system, then fill in the current values in the minimization formula’s. Those are the values to want to minimize/maximize.
How to show that a Markov chain is unichain?
Claim/show that the Markov chain will eventually become some sort of cycle.
How to write down the transitition functions in a DP model?
Just write down t(state, action) = new state, do this for every possible action. In the case of probabilities make sure you write the probability afterwards.
How to do value iteration?
- Start with the values V0 = (0, 0, ..) (usually, sometimes different starting values are defined.
- Calculate Vn(i) = min a in A(i) {c(i, a) + 𝛼∑pij(a)Vn-1(i)} for all i in I.
- Write down Rn(i), which is the minimum value
Obv. if you want to maximize change min into max
How to do policy iteration?
- Solve current system (thus with current actions), system xi = min a in A(i) {c(i, a) + ∑pij(a)xj}
- Fill in these new x’s in the system and find minimum again (you can put this step in a table, s.t. you can easily see which is best, though putting it in a min {.} statement is also possible.
- Stop when you get that the previous x1 is the same as the current x1, etc.
What do we know about the value of g, after value iteration?