markov models Flashcards
when we do not have control over the state transitions but the states are completely observable then we have a:
how do you solve a POMDP?
the model has to find, for any action/observation history, the action that maximizes the expected discounted reward. the model has to find, for any action/observation history, the action that maximizes the expected discounted reward:
a POMDP model contains: (3)
- A state transition (probability) function: P(s_{t+1}|s_t, a_t) (probability of the next state, given the current state and the action)
- An observation function: P(o_t|s_t,a_t) (probability of the observation, given the current state and the action)
- A reward function: E(r_t|s_t,a_t) (expected reward, given the current state and the action)
Partially Observable Markov Decision Processes are
MDPs that have partially observable states instead of the agent knowing exactly what the current state is.
what is the difference between POMDPs and MPDs?
in MDPs, an agent knows exactly what the current state is.
what are 2 methods to find the optimal policy of an MDP
value iteration
policy iteration
what is the utility in a MPD?
the utility of a state is the expected sum of discounted rewards if the agent executes the policy pi. The true utility of a state corresponds to the optimal policy pi*.
what is a markov decision process?
MPD’s are functions to choose actions, given the state of the world which are generally called policies. They associate an optimal decision with every state that the agent might reach in uncertain environments.
why / when are HMM’s used?
because an experimenter can not always observe the states, only measure them by the observable outcome.
how is a hidden markov model different form a markov model?
hidden markov models do not have observations given but rather represent the states by the probability distribution of several possible observations that can be expected to occur in this state.: the probability that we observe k given we’re in state i.
how do markov chains work?
1) starts in a state decided by starting distribution probilities.
2) visit states based on the probability to go from one state to the next
3) is a sttochastic process
when we have control over the state transitions but the states aren’t completely observable then we have a:
partially observable markov decision process: POMDP
when we do not have control over the state transitions and the states are not completely observable then we have a:
Hidden markov model (HMM)
when we do have control over the state transitions but the states are completely observable then we have a:
markov decision process (MDP)
when we do not have control over the state transitions but the states are completely observable then we have a:
markov chain