L7: Markov Chains, Monte Carlo Estimation Flashcards
Equation for conditional probability?
p(X1 | X2) = p(X1 and X2)/p(X2)
Name some dynamical systems with uncertainty
- Natural language analysis
- Mathematical finance
- Epidemiology
- Robot and autonomous vehicle control
Definition of a stochastic process
Sequence of RVs. The range (possible values) of the random variables in a stochastic process is called the state space of the process. Index n represents the time index.
How can we interpret a stochastic process as a BN
We can interpret a stochastic process as a Bayesian network (with infinitely many vertices)
Need initial distribution p(X1)
Conditional distributions CPTs (no independence assumptions unless given in basic BNs)- describe what happens next after a specific history up to time k
Number of parameters therefore grows exponentially with time, not workable (we assume an infinite number)
When were Markov chains introduced and by who?
Andrey Marvok in 1906
What is the Markov property?
Assumes that X_k+1 is only dependent on X_k (independent of all X_k-n)
A Markov Chain is an example of what (based on Markov property)?
Directed Acyclic Graph
What are the conditional PMFs in a Markov Chain called?
The transition probabilities
How many parameters are needed to model a Markov chain?
Grows linearly with n (but is not linear! more like n^2)
What is the meaning of the time homogeneity assumption?
Also called stationarity. Assumption is that p(X_k+1 | X_k) = p(X_2 | X_1). Therefore the transition probability does not depend on time n. Then the number of required parameters is constant with regards to the time index.
Can you do the question on slide 25?
Yes?
What is the limit distribution?
Marginal probability p(X_infinity) lim k tends to infinity p(X_k)
We consider this the ‘typical’ state of the system after burn-in
What properties does a continuous random variable have?
No jumps between values. If there is an integrable PDF for the RV. Measured over an interval.
A continuous random variable can be defined as a random variable that can take on an infinite number of possible values. Due to this, the probability that a continuous random variable will take on an exact value is 0. The cumulative distribution function and the probability density function are used to describe the characteristics of a continuous random variable.
The PDF = 1 when integrated over all X
Push forward function integrates the PDF over the ranges we’re interested in
PDF IS NOT A PROBABILITY (PMF is but not PDF, need to integrate it for prob)
Note: there are real valued RVs with no PDFs and if exists often not unique
Note: Any function that maps R to [0,inf] and =1 when fully integrated is the PDF for some continuous RV
Difference between PMF and PDF?
PMF = probability mass function (discrete) - already probability, PDF = probability density function (continuous)- needs to be integrated over interval to become probability
Refer to them as distribution functions
What is CDF?
For both continuous and discrete, = prob that x takes value equal to or below value (stairs for discrete, increasing curve for continuous)
What is the expected value?
The probability weighted average of the possible outcomes of X
Why is calculating the expected value of an RV often infeasible?
- Distribution function pX may not exist
- Distribution function pX may exist but be hard to compute
- In general, problem may involve high-dimensional sums or integrals
What is Monte Carlo estimation?
Process of approximating expected value of RV by sampling distribution.
Given a sequence of k independent sample the expected value = 1/k (sum of k samples). This will tend towards the real value the more samples you take.
Requires being able to sample from distribution
Convergence analysis is generally O(sqrt(k)) where k = num of samples
How do we sample from random variables?
- Find inverse CDF/ quantile function (this must be knowable), sample from there uniformly at random between [0,1]
If multiple values have same probability in CDF then take the lowest one (infinum)
Often X will have a distribution with a known quantile function (i.e. * Example: exponential family distributions (Gaussian, Gamma, Poisson, etc.).
BUT NOT ALWAYS
In what situations do we need to compute expected values?
Inference with probabilistic AI methods tends to involve computing E[f (X)]:
* Moment statistics (e.g. mean, variance)
* Loss functions (e.g. model fitting, decision problems)
* Computing probability PX (A) of event A.