CHAPTER 2: Review of probability models Flashcards
THE BASIC POISSON PROCESS
one dimensional poisson PROCESS
one dimensional poisson process is a model for the random times of occurrences of instantaneous events
Assume start counting at time t=0
graph of t against N(t): jumps at time of events ----xx--x---x---> t | -- | -- | _----- \_\_\_\_\_\_\_\_\_\_> t
N(t) =random variable for #events in (0,t]
for 0< u < v
N(v) -N(u) = RV for # events in (u,v]
N is a random counting function:
- N(t) is non-neg integer for any t
- N is an increasing function
- N(0)=0
the poisson DISTRIBUTION
The POISSON DIST was introduced by instantaneous events occurred in an interval of time (0,t]. Assumptions and dividing interval
So:
# events up to time t have dist equiv to binomial with n trials and success probability μ/n where μ= expected #occurrences over the whole interval.
Limit as n tends to infinity gives the POISSON DIST with parameter μ
we assume the expected #occurrences in (0,t] is proportional to t so N(t) has POISSON DIST with parameter λt
(λ constant of proportionality)
N(v)-N(u)
(u,v]
N(v)-N(u)~Poisson(λ (v-u))
POISSON PROCESS ASSUMPTIONS***
Homogeneous one dimensional poisson process:
1) For any 0 ≤u ≤v, distribution of N(v)-N(u) is poisson with parameter λ(v-u)
2) If (u_1,v_1], (u_2,v_2],…, (u_k,v_k] are DISJOINT time interval then N(v_1)-N(u_1), N(v_2)-N(u_2),…, N(v_k)-N(u_k) are indep RVs
( disjoint periods of time have indep #occur)
MODELLING AN EPIDEMIC
initial outbreak SIR model pop size N at continuous time t S_t- susceptible I_t- infective R_t- recovered/removed
Assume
S_0= N-1
N_0= 1
R_0= 0
For a short interval between times 0 and δt:
- one S catches the disease OR
- one I recovers/is removed - epidemic is over OR
- nothing changes
FROM (S_t,I_t,R_t)
to (S_t -1, I_t +1, R_t) at time δt
with probability ~ λS_tI_tδt
OR
to (S_t , I_t -1, R_t+1) at time δt
with probability ~ μI_tδt
(S_t doesnt change rate of recovery, higher prob if more infective)
Endemic
if recovery not faster than spreading.
λ and μ
dependent and rates of transitions depend on previous ie current state only
INTER-OCCURRENCE TIMES
Random vars T_i, continuous. Let T_i denote length of time until first occurrence. T_2 length of time between first and second occurrences.
T_n represents time between occurrences n-1 and n
THEOREM 1: 2 occurrences in a Poisson process
The probability that in (0, t] two occurrences of a Poisson process with rate λ occur at exactly the same time is zero.
THEOREM 1: 2 occurrences in a Poisson process
PROOF
The probability that in (0, t] two occurrences of a Poisson process with rate λ occur at exactly the same time is zero.
PROOF
Divide (0,t] up into n small intervals
( {i-1}t/n, it/n] for i=1,..,n
|___|___|_…….|___|___|
0 t/n 2t/n….. (n-1)t/n t
By our Poisson process assumptions
N(it/n) - N ({i-1}t/n)~Poisson(λt/n)
so we find the probability that this is ≥ 2
P( N(it/n) - N ({i-1}t/n) ≥ 2)
= 1 - [λt/n]exp{-λt/n} - exp{-λt/n}
(= 1 - (probability of 1) - (probability of 0) )
our small intervals are disjoint, so their numbers of occurrences are indep (by assumption). So, if we let Y_n be the number of the small interval with at least 2 occurrences.
Then, Y_n~ Bin(n, 1- exp(-λt/n)(1 + (λt/n)))
so P(Y_n =0) = ( exp(-λt/n)(1 + λt/n)^n) (by binomial P(Y=0)=(1-p^n) )
= exp(-λt)( 1 + (λt/n))^n
Because limit as n tends to infinity (1 + (λt/n))^n = exp(λt)
P(Y_n=0) tends to 1 as n tends to infinity*
so there cannot be a positive probability of two occurrences at the same time t
( But if there were
a probability p > 0 that there were two occurrences at exactly the same time, we would
have P(Yn = 0) < 1 − p for all n, so this cannot be the case. Hence the probability of there
being two occurrences at exactly the same time is )
THEOREM 2: inter-occurence times
Inter-occurrence times are independent of each other, and are exponentially distributed with parameter λ
THEOREM 2
PROOF
Inter-occurrence times are independent of each other, and are exponentially distributed with parameter λ
PROOF
P( T_1 ≤ t) is the probability that the first occurrence has happened by time t
so is the probability that there is at least one occurrence in (0,t]
Hence P(T_1 ≤ t) = P(N(t) ≥ 1) = 1 − P(N(t) = 0) = 1 − e^{−λt} (the distribution function of an exponential dist with parameter λ)
Consider the probability that T_n is at most t, conditional on the values of T_1, T_2, . . . , T_{n−1},
P(T_n ≤ t|T_1 = t_1, T_2 = t_2, . . . , T_{n−1} = t_{n−1})
(There are some technical details to deal with the fact that we are conditioning on an event of probability zero, which is why this proof is a “sketch”.) Similarly to the above argument,
this is the probability that there is at least one occurrence between times t_1 +t2 +. . .+tn−1
and t1 + t2 + . . . + tn−1 + t, which again is 1 − e^{−λt}. So, regardless of the values taken by
T_1, T2, . . . , Tn−1, the conditional distribution of Tn is exponential with parameter λ, which
implies the result
ANOTHER WAY OF DEFINING THE POISSON PROCESS
We can construct the poisson process: by considering a sequence of indep RVs T_1, T_2,….
each with exponential distribution with parameter λ. Discrete events occur at T_1, T_1+T_2,…
N-1 th event occurs at time T_1 + T_2 + … + T_{N-1}
take care
take care when referring to less than or equal to events with probabilities
EXAMPLE 1: disease modelling
infections from a rare disease from a poisson process with rate 2 per year. Let N_t =#infections between now, time 0, and time t years then:
Prob of no infections in next year
probability there are no infections in the next two years and then three in the year after
Expected time until next infection
• The probability there are no infections in the next year is P(N(1) = 0) = e^{−2} = 0.135 . . ..
• The probability there are no infections in the next two years and then three in the year after that is P(N(2) = 0, N(3) − N(2) = 3) = P(N(2) = 0)P(N(3) − N(2) = 3) = e^{−4} ·e^{−2}2^{3}/{3!} = 0.00331 . . .,
as N(2) ∼ Po(4) and
N(3) − N(2) ∼ Po(2),
and they are
independent.
(disjoint)
• The distribution of the time until the next infection occurs is exponential with parameter 2 (and so has mean 1/2).
THM 3:
A conditional property of the Poisson process
Given the total number of points of a homogeneous Poisson process in an interval, the positions of the points are independently uniformly distributed over the interval.
notation: N_{u,v}
N_{u,v}= N_u - N_v
= # occurences in (u,v]
THM 3:
Given the total number of points of a homogeneous Poisson process in an
interval, the positions of the points are independently uniformly distributed over the interval.
NOTE
Note that this is equivalent to the statement that, conditional on there being n occurrences
in (s, t], the number of occurrences in any interval (u, v] ⊆ (s, t] (so 0 ≤ u < v ≤ t) has
a Bi(n,(v − u)/(t − s)) distribution, as each of the n occurrences would have probability
(v − u)/(t − s) of being in (u, v], independently of the others. We will prove this latter
version of the statement, and will assume, without loss of generality, that the interval we
are conditioning on the number of points in is (0, t].
THM 3:
Given the total number of points of a homogeneous Poisson process in an
interval, the positions of the points are independently uniformly distributed over the interval.
PROOF
Proof. For simplicity, write Nu,v = N(v) − N(u), the number of occurrences in
(u, v].
Note
that N_{0,t} ∼ Po(λt),
N_{u,v} ∼ Po(λ(v −u)) and N_{0,t} − N_{u,v} = N_{0,u} + N_{v,t} ∼ Po(λ(t−(v −u)),
and the latter two random variables are independent. Thus, for 0 ≤ a ≤ n, P(N_{u,v} = a|N_{0,t} = n) = P(N_u,v = a, N0,t = n) P(N0,t = n) = P(Nu,v = a, N0,t − Nu,v = n − a) P(N0,t = n) = P(Nu,v = a)P(N0,t − Nu,v = n − a) P(N0,t = n) = (λ(v − u))a e −λ(v−u) a! (λ(t − (v − u)))n−a e^{−λ(t−(v−u)) (n − a)! n! (λt) ne−λ = e −λ(v−u) e −λ(t−(v−u)) e−λ n! a!(n − a)! λ a (v − u) aλ n−a (t − (v − u))n−a (λt) n =
n a v − u t a 1 − v − u t n−a , which is the probability that a Bin(n,(v − u)/t) random variable takes the value a, as required.
Another way to simulate Poisson process ~λ on [0,t]
1) first generate the #points in [0,t] as observations N(t) from the poisson distribution Po(λt)
2) generate n independent uniform u(0,t) vars U_i
i=1,…,n and put a point at each position U_i
then the successive positions of the points from left to right will be ordered values W_1 ≤… ≤ W_n of the U_i’s
***prev
We can construct the poisson process: by considering a sequence of indep RVs T_1, T_2,…. each with exponential distribution with parameter λ. Discrete events occur at T_1, T_1+T_2,…
DEF 4:
Events in a short period of time, and Landau O and o Notation
For sequences x_n and y_n,
x_n = o(y_n) as n → ∞ means lim as n→∞ of x_n/y_n = 0,
and
x_n = O(y_n) as n → ∞ means that for some B, |x_n/y_n| ≤ B for all n sufficiently large.
If instead f(h) and g(h) are (real) functions of a continuous variable h, then
f = o(g(h))
and f = O(g(h))
as h tends to a limit have analogous meanings.
f = o(g(h)) as h tends to l: if f(h)/g(h) tend to 0 as h tends to limit e.g 0 then f(h)= o(g(h))
f = O(g(h)) as h tends to l:
if f(h)/g(h) less than or equal to B for all n sufficiently large then f(h)= O(g(h)) at least as fast as
Landau O and o Notation
NOTES
We will use these continuous versions mainly in the case
h → 0, for example to look at the behaviour of probabilities over very short time intervals.
O:
x_n tends to 0 at least as fast as y_n
grow at the same rate
o:
strict upper stronger statement
as fast as / faster than
EXAMPLE: Binomial distribution asymptotics
Let S_n ∼ Bin(n, p) with fixed p ∈ (0, 1), and consider
x_n = P(S_n ≤ 1) = (1 − p)^n + np(1 − p)^{n−1}
As x_n = (1-p)^{n-1} (1-p +np) x_n tends to 0 as n tends to infinity.
So x_n =o(1) as n tends to infinity.
Rate of convergence to o by comparing with another funct/seq which converges to 0:
TRY (1-p)^n x_n/ (1-p)^n = 1+ n(p/{1-p}) tends to infinity so we cannot write x_n = O( (1-p)^n) nor is it o
TRY n(1-p)^n
x’_n/{n(1-p)^n}
= (1/n) + (P/{1-p}) ≤ 1 + (P)/(1-P)
and hence bounded
So x_n = O( n(1-p)^n) as n tends to infinity
but x_n ≠o( n(1-p)^n)
as 1/n tends P/(1-P) does not tend to 0 as n tends to infinity
however,
x_n/{n^2(1-p)^n} = (1/n^2) + (p)/(n(1-p)) tends to 0 as n tends to infinity
so x_n = o(n^2 (1-p)^n)
so x_n tends to 0 faster than n^2(1-p)^n and as fast as
n(1-p)^n
EXAMPLE 3:
Let p(t) be the probability that there is at least one event of a Poisson process with rate λ in the time interval {0,t]. By the assumption that the number of events in the interval has a Poisson distribution with parameter λt, we have p(t) = 1 − e^{−λt}.
We consider the asymptotics of this
as t ↓ 0.
By SERIES
p(t) = λt − [λ^2t^2/2] + [λ^3t^3/6] − [λ^4t^4/24] + . . .
and thus
p(t)/t
= λ −[λ^2t/2] + [λ^3t^2/6] − [λ^4t^3/24] + . . . .
The right hand side here tends to λ as t → 0, so is bounded close to 0, and hence we can write
p(t) = O(t).
To be more precise, we consider the difference
p(t) − λt, and observe
[p(t) − λt]/t
= −[λ^2t/2] + [λ^3t^2/6] − [λ^4t^3/24] + . . . → 0 as t ↓ 0.
So we can write
p(t) − λt = o(t), and we also write this as p(t) = λt + o(t).
POISSON PROCESS CHARACTERIZED
- If (u1, v1], (u2, v2], . . . ,(uk, vk] are disjoint time intervals then
N(v_1)− N(u_1), N(v_2)− N(u_2), . . . , N(v_k)− N(u_k) are independent random variables. - N(u) takes non-negative integer values, and is increasing in u.
- For any time u, as t ↓ 0 we have that
P(N(u + t) − N(u) ≥ 1) = λt+o(t)
( prob at __, tends to 0 faster than t does)
as t tends to 0, tends to 0 faster than o(t)/t tends to 0 does
* for some small time interval (u, u+t) the probability of one event, 2 or more events is almost 0
and
P(N(u + t) − N(u) ≥ 2) = o(t).
(The first of these properties was the second of the properties we assumed originally.)
Discrete time Markov chains
Definition 5.
MARKOV PROCESS
Let {X_t : t ∈ T} be a set of random variables (a stochastic process), with an
infinite index set T.
If, for each real number x, each n ≥ 1 and t_1 < . . . < t_n < t with t_i ∈ T for all i, the
following condition (the Markov property) holds
P{X_t ≤ x | X_{t1} = x_1, . . . X_{tn} = x_{n}} = P{X_t ≤ x | X_{tn} = x_n}
we say X_t is a Markov process.
POISSON PROCESS and markov
Poisson process is a continuous time Markov process
DEF 6:
MARKOV CHAIN
STATE SPACE
If {Xt, t ∈ T} is a Markov process and there is a countable set S such that
P{Xt ∈ S} = 1 for all t ∈ T we say that {X_t, t ∈ T} is a Markov chain. S is the state
space of the chain and points i ∈ S are states of the chain.
If T is a set of integers, {X_t} is a DISCRETE time Markov chain.
DEF 7:Homogeneous Markov chains
A Markov chain is said to be homogeneous if and only if X_{t+h} | X_t has distribution independent of t, so that the probability that you move from one state to another
in a fixed number h of moves is the same whenever the moves begin.
(probabilities depend on length of interval, not on starting time)
DISCRETE TIME CHAIN
with t and n in integers
P{X_{t+n} = j | X_t = i}
= P{X_n = j | X_0 = i}
DEF 8:
TRANSITION PROBABILITIES
Let X_n be a HOMOGENEOUS DISCRETE time Markov chain. Then we call the
probabilities P{Xn = j | X0 = i} the n-step TRANSITION PROBABILITIES and denote them by
p^(n)_{ij}
(at any 2 times n steps apart)
DEF 8:
ONE-STEP TRANSITION PROBABILITIES
For n = 1 they are called the one-step transition probabilities or simply transition probabilities, and we just write
p_ij
instead of
p^(1)_{ij} .
p_ij= probability of j at next time step given current state i
Transition matrix
The n-step transition probabilities form a |S| ×|S| matrix,
(countable or infinite state space)
P^(n) = (p^(n)_ij ), which is called the
n-step transition probability matrix.
For n = 1 it is called the one-step transition matrix, or just the transition matrix,
and we write P instead of P^(1).
*row sums of a transition matrix P are all equal to 1
sum over j of p_ij = 1. (non negative so stochastic matrix)
Starting distribution
row vector of probabilities which sums to 1, distribution of starting probabilities
EXAMPLE
Wet and Dry days
S={W,D}
X_n = W means that day n is wet
Assuming a Markov
chain model, let our transition matrix be
[1 − α α ]
[β 1 − β ]
Then, given that day n is dry, the probability that day n + 1 is wet is α, while
given that day n is wet, the probability that day n + 1 is dry is β.
α, β in [0,1]
rows sum to 1
EXAMPLE
Random walk on triangle
Label the vertices of a triangle by A, B and C, and assume that a particle moves as a Markov chain from vertex to vertex, at each time step moving from its current vertex to one of its neighbours, moving clockwise with probability p and anti-clockwise with probability 1−p. Then
S = {A, B, C}, and the transition matrix is
[ 0 p 1 − p ]
[1−p 0 p ]
[ p 1 − p 0 ]
If p = 1/2 then the random walk is symmetric
EXAMPLE Ehrenfest model for diffusion
Imagine that we have N molecules, each of which is in one of two containers, A or B. At each time step, one molecule (of the N) is selected at random (each equally likely to be chosen),
and moved to the other container.
Let Y_n be the number of particles in container A at time n. Then (Y_n) forms a Markov chain
with state space
{0, 1, 2, . . . , N} and transition probabilities given by
p_{k,k−1} =k/N
p_{k,k+1} =(N−k)/N
p_{k,j} = 0 if j /∈ {k − 1, k + 1}.
The Chapman-Kolmogorov equations
Let Xn be a discrete time homogeneous Markov chain with state space S. It was shown in MAS275 that the transition probabilities for Xn satisfy the Chapman-Kolmogorov
equations
p^(m+n)_ij =
Σ
k∈S
[p^(m)_ik p^(n)_kj],
*and that therefore the n-step transition matrix
P^(n) equals P^n ,
the n-th power of P.
The Chapman-Kolmogorov equations NOTES
*with knowledge of the initial state, allows us to calculate the probability that the
chain will be in any specified state at time n; that is
P{X_n = j} = π^(n)_j
- any j, any time n
- condition on state of chain at time 0:
π^(n)_j = P{X_n = j} = Σ i [P{X_n = j | X_0 = i}P{X_0 = i}] = Σ i [π^(0)_i p^(n)_ij]
using the Chapman-Kolmogorov equations
RANDOM EVOLUTION OF THE CHAIN
the random evolution of the chain is determined by the transition matrix P and the initial probabilities π_i
In matrix-vector terms:
π^(n) = π^(0)P^n
where π^(n)
is the row vector with entries (π^(n)_i: i ∈ S).
(
π^(n) is vector giving the dist of state n
π^(0)is vector giving starting dist at time 0
P^n is the nth power of the TM)
P^n matrix multiplication
spectral representation of P:
P = T DT^{−1}
where D is a diagonal matrix whose diagonal entries are the eigenvalues of P, and T is a non-singular matrix.
columns t of T are right eigenvectors of P; that is, they satisfy Pt = dt for an eigenvalue d of P
- since P is stochastic d=1 is always an eigenvalue
corresponding to eigenvector
t’ = (1, . . . , 1) - eigenvalues of a stochastic matrix have modulus≤ 1, so under our assumption about P, d = 1 is the largest eigenvalue
P = T DT^{−1}
It is convenient
to reorder rows and columns so that the diagonal entries in D are in order of decreasing
modulus, so D is of the form
D =
[1 0 · · · · · ] [0 d_2 0 · · · ] [0 0 d_3 · ] [........................] where 1 > |d2| > |d3| > . . ..
P^n
P^n = T D^nT^{−1} n = 1, 2, . . . .
P^2 = T DT^{−1} T DT^{−1}
= T D^2T^{−1}
D^n = [ 1 0 · · · · · · ] [0 d^n_2 0 · · · ] [0 0 d^n_3· ·] [... ... ... ... ]
Stationary distributions
DEF 9:
Let X_n be a discrete time homogeneous Markov chain with one-step transition matrix P. Any distribution π such that π_j = Σ i∈S [π_ip_{ij}]
is called a stationary distribution of the chain. In other words, bold(π) = {π_k} satisfies π = πP, so it is a left eigenvector of P with eigenvalue 1.
If a Markov Chain has a stationary distribution π and if the starting state is chosen according to π, then Xn ∼ π for all n, since
π^(1) = πP = π,
π^(2) = πP^2 = (πP)P = πP =π
etc
STAT DIST NOTES
*A stationary distribution need not exist or be unique,
*However Markov
chains that do have a stationary distribution are of particular interest as models for processes whose overall properties remain stable over time even though the state of the process itself is
continually changing
*many Markov chains, π^(n)
converge to a stationary distribution, irrespective of the starting distribution, and it is these chains that will be particularly useful as models for stable systems
ERGODIC CHAINS
ERGODIC CHAINS are IRREDUCIBLE- always move from one state to another by steps
APERIODIC- times to return have common factor 1
POSITIVE RECURRENT- starting at particular probability 1 to return and expected time until return is finite
p^(n)_{ij} → πj > 0 as n → ∞
and
{π_j} is a stationary distribution
irreducible and aperiodic
if every element of the TM is strictly positive then chain is irreducible and aperiodic
“chain forgets where starts over time”
EXAMPLE: STATIONARY DISTRIBUTIONS
S={W,D}
π =( π_D π_W)
we find a stationary distribution by solving
π = πP
(1 − α)π_D + βπ_W = π_D
απ_D + (1 − β)π_W = π_W
as its a distribution:
π_D + π_W = 1
π_D =(β/α)π_W
π =(β/(α+β), α/(α+β) )
the chain is irreducible and aperiodic, (and positive recurrent)
and hence ergodic
probability that day n is dry converges to β/(α+β)
as n → infinity
EXAMPLE: Ehrenfest model stat dist
Solving the stationary distribution equations gives a unique
stationary distribution with
π_j =(NCj)/2^N .
However, this chain is not aperiodic
(odd and even states
alternate)
and so the convergence results do not apply
poisson process has number occurences
distributed poisson dist with parameter lambda
lambda= mean and variance "lamba= 1/(expected occur once every \_\_)
we use lambda *t = mean number of occurrences in the interval of length t
P(X=k)= (lambdat)^k * e^{-lambdat} / k!
P(X less than or equal to k)
= sum of less than or equal to k (lambdat)^k * e^{-lambdat} / k!
Long run
Long run relative frequencies =limiting probabilities= stationary probabilities
We find stationary by solving pi P = pi