Dynamic probabilistic models Flashcards
Describe how we view the world when using these models
We view the world in snapshots or time slices
Each time slice contains a set of random variables
Some of these are hidden and some are observable
How do we denote the state variables at time t?
These unobservable variables are denoted as:
Xt
How do we denote the observable evidence variables at time t?
Et
where the observationa at time t is:
Et = et for some values et
What is the example we use to understand the situation of rain?
You are a security guard in an underground location.
You want to know if it is raining today, you can also know that by seeing if someone brings an umbrella or not
For each day t, the set Et contains a single evidence variable,
Umbrellat / Ut
The set Xt contains a single state variable
Raint / Rt
We then assume that the time betweens lices are fixed
and that the evicence starts arriving at t = 1
What is the difference between a first order markov process and a second order?
See picture
Explain the transition model
The transition model specifies the probability distribution over the lastest state variables, given previous values.
Can be written as:
P(Xt | X0:t-1)
This would become increasingly large, but with the markov assumption we can write following for first order markov processes:
P(Xt | X0:t-1) = P(Xt | Xt-1)
Extra info:
for second order markov processes it would be:
P(Xt | X0:t-1) = P(Xt |Xt-2, Xt-1)
Explain the sensor model
See figure
The evidence E could depend on previous variables as well as the current state variable X.
We can make the following sensor markov assumptions:
P(Et | X0:t, E0:t-1) = P(Et | Xt)
So the sensor model will therefore be:
P(Et | Xt)
Why does the arrows go from X to E?
The arrows go from the actual state of the world to the sensor values because the rain causes the umbrella to appear.
Bonus info:
For inference this goes the other direction
What are the requirements for the markov process?
Prior probability distribution
transition model
sensor model
What are the basic inference tasks for a
markov process
/
dynamic models?
Filtering
prediction
smoothing
most likely estimation
What is filtering?
Compute the belief state – the posterior distribution of the current state – given the evidence so far, i.e., P(Xt | e1:t). For example, P(Raint | umbrella1:t).
In our example this would mean computing the probability of rain today, given all the observations of the umbrella carrier made so far.
filtering is what a rational agent does to keep track of the current state, so that rational decisions can be made.
What is prediction?
This is the task of computing a posterior (fremtidig!) distribution over the future state given all the evidence that we have so far.
So we wish to compute P(Xt+k | e1:t) for some k > 0
In the umbrella example, this might mean computing the probability of rain three days from now, given all evidence to date.
Prediction is usefull for evaluating possible courses of action based on their expected outcomes
What is smoothing?
This is the task of computing the posterior (fremtidig!) distribution over a past state given all the evidence so far. So we might compute:
P(Xk | e1:t)
for some k such that
0 <= k <= t.
In other words: computing the probability that it rained wednesday, given all the observations of the umbrella carier made up to today.
why?
Smoothing provides a better estimate of the state than what was available at that time, because it incorporates more evidence.
What is most likely explanation?
Given a sequence of observations, we might wish to find the sequence of states that is most likely to have generated those observations. That is, we wish to compute
argmax<strong>x</strong>1:t( P(x1:t | e1:t)
For example, if the umbrella appears on each of the first three days and is not there the fourth, then the most likely explanation is that it rained on the three days and not on the fourth.
Bonus info:
this is also usefull in the field of speech recognition.
How do we perform filtering?
See figure
https://gyazo.com/65025743759cbf9429cd9462db282e68
- get prior probability
https://gyazo.com/1f225f8079f051f1f0917f4cdcb54bc7
- predict R1
https://gyazo.com/14b249e5bcc573c5e250ea4485aab559
- Update by multiplying this with the probability of evidence and normalise
https://gyazo.com/19e8c6a4e2d30f3bb029a27390b6c5e7
we normalize <0.45, 0.1> by saying:
<0.45 / (0.45 + 0.1) , 0.1 / (0.45 + 0.1)
For day two we now have the last update and use that instead of the prior probability.
How do we do prediction?
The task of prediction can be seen as filtering without the addition of new evidence.
https://gyazo.com/6c6a75dd423882ee2e2ca310371b1738
Bonus info: when k becomes very large (see out in the future) the distribution for rain converges to 0,5/0.5 again.
What are the three components for filtering?
What are the components of the update function?
Transition model
Last update
What are the components of the smoothing function?
How do we perform smoothing?
Recursive process: follow the examples here:
Step one:
https://gyazo.com/3d045798aa7dd7c1253dd1f915c55d7c
Step two initial backwards message:
https://gyazo.com/b155cadbfcc0287465d89bfb51f4a7aa
Step 3: calculate probability given forward and backward
https://gyazo.com/9ac4f440c90a1c8e6d6e3714e98378ef
Step 4: calculate backward for b2:2
https://gyazo.com/0855da7a239fc43580d298b0cbc2a59c
Step 5 (last): calculate probability for R1
What are some extra observations about smoothing?
The forward and backward recursions take constant time per step (for a single k) O(t).
How can we make the algorithm run in time O(t) for all k? Simply cache the results of the forward procedure before starting with the backwards procedure
The algorithm is also called the Forward-backward algorithm.
Why are most likely explanation usefull
If we see a sequence [true, true, false, true, true]
then maybe it did not rain the third day, but the courier just took his umbrella this day for security sake.
Explain the MLE algorithm:
By viewing all possible states for the sequence, we can do as done in the figure:
https://gyazo.com/1a9307e21f846b037b3d96dff2346ef4
This is the Viterbi MLE algorithm that we use to calculate it.
What is some last comments on the MLE algorithm?
For long observation sequences we are likely to run into numerical instability problems. Why?
Solution: Use logarithms of the probabilities and exchange multiplications with sums.