CHAPTER 4: Inference for discrete time Markov chains Flashcards

1
Q

A weather problem

A

We focus on the problem of constructing, fitting and checking a model for daily rainfall at a specific location. As a first step, given observed rainfall amounts, r1,…,rn say, on a sequence of consecutive days, we suppose that daily rainfalls are the realised values of a sequence of random variables R = R1,…,Rn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Given Rainfall at Snoqualmie Falls

A

1948-1843 Januaries

The fact that on some days there is no rain and on others a positive amount suggests that it might be useful to consider first the succession of dry days and wet days, and later model the amount of rain falling on a day conditionally on the day being wet.
Let
X_i =
{1 if Ri bigger than  0 
{0 if not 

so that Xi is an indicator variable for day i being wet

We seek a simple model for the Xi. It is possible that precipitation patterns vary seasonally, so initially we concentrate on modelling only within a limited period during which conditions can be expected to remain reasonably stable. For this reason we concentrate on the sequence of Xi during Januaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Independence model:

Snoqualmie Falls

A
  • simplest model supposes that the Xi are independent and identically distributed; that is, they form a Bernoulli sequence
  • ie that its indep of W or D the day before
  • so one parameter p for prob of being wet, 1-p for dry

*If p = P(X_i = 1) then the likelihood of p based on observations x1,…,xn is
L(p) = p ^{Σ i=,n [xi]} (1−p)^{n−Σi=1, n [xi]}.

(ie probability of a particular sequence is p^{number of wet days}(1-p)^{number of dry days})

Over the Januaries 1948-1983 there were 791 wet days out of 1116 days altogether.

Thus Σi=1,1116 [ xi] = 791
and the log-likelihood is
l(p) = 791log(p) + 325log(1−p)

figure of the log likelihood for parameters at p is a trajectory skewed st peak at around 3.71 ( which is the number of dry days/ total days) a line intersecting at log-likelihood just above -675 gives suitable likelihood region/ 95% CI)

The MLE
ˆ p =Σi=1,n [xi/n] = 791/1116 = 0.71

and an approximate 95% confidence interval for p obtained from the log-likelihood is (0.68, 0.74)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Independence model:
Snoqualmie Falls

NOTES

A
In fact in this case we can find a confidence interval directly, since under our assumptionsPn i=1 Xi has a Binomial Bin(1116,p) distribution, so that the estimated standard error of ˆ p is
sqrt root(pˆ (1− ˆ p)/n) and an approximate 95% confidence interval based on the Normal approximation is 
ˆ p±2×e.s.e. 

(2*standard error)

Here the two intervals, likelihood-based and Binomial-based, are numerically the same to 2 dp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Independence model:
Snoqualmie Falls

Model adequacy

A
  • We might suspect not, because large weather systems may take several days to pass a point, and whilst they are doing so the weather conditions at the point will tend to persist, undermining the independence assumption.
  • For a rough check, consider the number of dry days followed by wet days. Under independence, the expected number of such pairs in the data is E( no. dry wet pairs) = 36×30×(1−p)p, since there are 36 Januaries in the data-set and 30 pairs of days in each. Similarly for other combinations of wet and dry. To estimate these expected numbers we replace p by ˆ p.

*given a table for the numbers of pairs of days we have more D-D and W-W pairs than expected
(Today)

Wet| Dry |Total 
Wet 643 (542.6) |128 (223.0) |771
Dry 123 (223.0) |186 (91.6) |309 
Total 776 |314 |1080
Yesterday
actual (estimates from model)

*we might consider carrying out a goodness of fit test. The Pearson X^2 statistic

X^2 =
Σ[(observed−expected^2/expected]

is a natural choice for a test statistic for such a test, since large values of X^2 will indicate discrepancy from our expectations under the independence assumption. For a standard 2×2 contingency table, the observed value of X^2 would be compared with a χ^2_1 distribution, which is the approximate distribution of X^2 under the hypothesis of independence in a contingency table

For such a table the underlying model is a Multinomial distribution with four categories, and the basis for use of this distribution is a sampling scheme in which each of the items. whose frequencies are recorded in the table can fall independently into the four cells with prescribed probabilities.

Is that appropriate for Table 1?
In Table 1 the items counted in the four cells are pairs of days. From the way the data have been collected it is clear that consecutive pairs of days overlap; the second day of one is the first day of the next (within each January). Thus our null hypothesis of independence and identical distribution of days does not translate immediately into the standard Multinomial basis for a X2 goodness of fit test. We need an extension of the standard theory to the case when items are collected in such a way that each is dependent on its predecessor.

( WE DONT HAVE INDEP SO MULTINOMIAL NOT APPLICABLE)

If instead we consider non-overlapping consecutive pairs, then individual items being classified would be independent under our hypothesis and the χ2 theory would apply. Frequencies from non-overlapping pairs are

Today
Wet Dry Total
Wet 325 (271.3) 55 (111.5) 380

Dry 66 (111.5) 94 (45.8) 160 Total 391 149 54

Yesterday (on lhs)

A standard X2 test applied to Table 2 (with expected values calculated from the marginal totals in the table rather than from our estimate ˆ p = 0.709) gives X2 = 108. For comparison the 95% quantile of χ2 1 is 3.84. There is overwhelming evidence against the independence assumption even with this partial view of the data. Wetness or dryness has a tendency to persist from one day to another to a greater extent than the independence theory would predict.
This points the need for the analysis of dependent variables. We need models that allow a variable’s value today to affect its value tomorrow. A discrete time Markov chain is such a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Estimation of transition probabilities

for n states

A
  • We now assume that we have observations of a process which we think might be well modelled by a Markov chain on a known state space S, but with unknown transition matrix P
  • estimate P by using likelihood:

Suppose we have observations x1,…,xn modelled as successive states of a Markov chain X1,…,Xn with transition probabilities p_ij

So probability of observing a given sequence of states is given by transition probabilities:
P(X0 = x0,…,Xn = xn) =
= P(Xn = xn | Xn−1 =xn−1,…,X0 = x0) P(Xn−1 = xn−1,…,X0 = x0)

= P(Xn = xn | Xn−1 = xn−1) P(Xn−1 = xn−1 | Xn−2 = xn−2,…,X0 = x0) × P(Xn−2 = xn−2,…,X0 = x0) = P(Xn = xn | Xn−1 = xn−1) P(Xn−1 = xn−1 | Xn−2 = xn−2)… …×P(X1 = x1 | X0 = x0)×P(X0 = x0)

= P(X0 = x0) px0 x1 px1 x2 … pxn−1 xn

by the Markov property, which is the likelihood L for the parameters P = (pij) given data.

n_ij = number of transitions observed from state i to state j. So:

L = L(P) = P(X_0 = x_0)∏_{ i,j} of [p^{n_ij} _ij]

log likelihood for P:
l(P) = log(P(X_0 = x_0) +∑_{i,j} of [nij log(pij)]

As usual, values of the pij giving large l are plausible in the light of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Estimation of transition probabilities
:

L = L(P) = P(X_0 = x_0)∏_{ i,j} of [p^{n_ij} _ij]

log likelihood for P:
l(P) = log(P(X_0 = x_0) +∑_{i,j} of [nij log(pij)]

A

The maximum likelihood estimates of the transition probabilities are obtained by maximizing l(P) subject to P being a transition matrix. This means that we will need to apply the constraint that the sum of the entries in each row is 1; we can do this either using Lagrange multipliers (see MAS211) or by, for example,

writing the last term in each row in terms of the others as we did in Example 9: ]
if S = {1,2,...,N}
then 
p_{i,N} = 
1−(p_{i,1}+p_{i,2}+...+p_{i,N−1}). 

In some cases we may assume a particular form for the transition matrix where the constraint that the row sums are 1 is already assumed

  • the issue is that P(X_0 = x_0) isn’t dep on parameters p_ij?
    if we argue conditionaly:
    the (conditional) likelihood is defined as P(Xn = xn,… | X0 = x0) and the above argument gives a log-likelihood without the P(X_0 = x_0) term
  • Another approach is to assume that the chain is in equilibrium, meaning that we can take P(X0 = x0) to be the probability of finding the chain in state x0 under its stationary distribution
  • when n is large, l is likely to be dominated by its second term, so the precise treatment of the first term is unimportant- we ignore it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Estimation of transition probabilities
:

ARGUING CONDITIONALLY

L = L(P) = P(X_0 = x_0)∏_{ i,j} of [p^{n_ij} _ij]

log likelihood for P:
l(P) = log(P(X_0 = x_0) +∑_{i,j} of [nij log(pij)]

A

Suppose the state space S is finite. Then, arguing conditionally on X0 = x0, we wish to maximize

{i,j} [ nij log(p{ij})

subject to ∑_j [pij] = 1,
for i = 1,…,|S|.

Form the Lagrangian function
L(P,λ) =∑{i,j} [nij logpij] +∑{i} λi .
Then

∂L ∂pij
=nij/pij + λi, i,j = 1,…,|S|. (23)
From this we get that
ˆ pij =nij/ −λi

and thus, using
{j} [ˆ pij] = 1, it follows that
ˆ pij = nij/n
{i.} (24)

Here we use ni. to mean ∑{j} nij.
= total number transitions from i
Similarly, we will later use n.. to mean∑
{i,j} [nij]
= total number transitions from i to j

eg if we know that we have 25 transitions from state i and 6 of them are to state j, estimate for the transition prob p_ij = 6/25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Example 13. Snoqualmie Falls

A

Given transition count data - to fit a 2 state MC model for sequence of W or D. Under this model the log likelihood has form:

l(P) = log(P(X_0 = x_0) +∑_{i,j} of [nij log(pij)]
for each January but for the full likelihood we consider relationship between years.
Transition probabilities for the month of jan given, we assume the underlying behaviour is the same in each year

We can suppose the years are independent and that TPs {p_ij} are the same over the years. (eg 1948,1949,…)

As 1st model:

1) assume independence from 1 year to another and
2) stationarity over the years.

Let n_ij ^ (k) = observed transition count from i to j in year k.

Then the likelihood of P based on the observed transition count from i to j in year k.

implies likelihood of P based on observed seq, of W/D, conditionally on the 1st obs in jan each year is:

L(P) =∏_ k ∏ {i,j} [p ^{n^(k) ij} _ ij ,
giving log-likelihood
l(P) =∑
k ∑
{ij} [n^(k) _ij logp_ij]

=
ij[nij logpij]_∑

The maximization used above therefore applies directly, and from Table 1 gives estimated transition matrix: The estimated probabilities are quite different from those based on the independence model, which would be (0.29, 0.71) in each row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

EG

2x2

3X3 TRANSITION

A

[1-α α]
[β 1-β]

[α β 1-α- β]
[γ δ 1-γ-δ]
[ε ζ 1- ε-ζ ]

[0 P 1-P]
[1-P 0 P]
[P 1-P 0]

[p_11 p_12…. p_1N]
[p_21… p_2N]
[… … …. ]
[p_M1……. p_MN]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Uncertainty on these transition probabilities estimates:

A

Consider the conditional likelihood function and to allow for constraint on the parameters in a given row, we can reparametrize the transition matrix

Table 3: Estimated Transition Matrix; Januaries 1948–1983
Today Dry Wet
Dry 0.602 (186/309) 0.398 (123/309)

Wet 0.166 (128/771) 0.834 (643/771)
Yesterday

(MLE for TPs)

for uncertainty we assume conditional on value of 1st jan of that year

*To allow for constraint: reparametrize the TM in terms of first |S|-1 probabilities on each row and look for likelihood region

eg |S|=2 easy

Alt: asymptotic approximation

Testing the model: We have considered pairs of days . We look at data we haven’t used to fit the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Example 14. Snoqualmie Falls: persistence of spells

The Markov chain model in the example has been fitted by considering pairs of days. A natural question to ask is how well does it represent features of the data other than those used in its fitting

A

Length of wet/dry spells for jan 1983:

11....01...101...10100
1x10
0
1x9
0
1x6
0
1x1
00

3 dry spells whose beginnings and ends fell in this month and lasted a day.(1=W, 0=D)

over 36 years average length of dry spells that did not overlap Jan 1 or Jan 31 is 2.21 days and standard deviation of length is 1.64 days.

Using markov properties we can calculate the expected duration of dry spells in the model and compare observed values with calculated to give a rough check for model:

  • Let T denote the duration of dry spell
    *To observe a dry spell there must be a wet day followed by a dry day, we start numbering time from a wet day followed by a dry day. Then
    ….
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

TEST FOR SPECIFIED P

modelled by Markov chain with known transition matrix P*

A

data modelled by Markov chain with known transition matrix P*.

Null hypothesis-
H_0: P=P*

alternative H_1: P is unrestricted

Use the log likelihood ratio test statistic

W=-2(l(P*) - l(P^))

2*(log likelihood at assumed null hypothesis value - maximum log likelihood / estimate)

Note: Large values of W discredit H0
If hypothesis is true distribution approximatelt χ2 with d −s degrees of freedom.

d= # entries in TM that are non-zero eg s^2 if all

s= state space size ie # rows in TM (from sum equal 1 we need all but 1 element of each row)

**Unless we are assuming a particular form for the transition matrix, d here will be the number of entries in the matrix, i.e. s2, so the number of degrees of freedom will be s^2 −s

The test statistic W is given by

W = −2(l(P∗)−l(ˆ P)) =
2Σ _{i,j} [nij log [nij/(ni.p∗_ij)]

where p∗ ij is the i,j entry of P∗

2* sum over all pairs of states of
(#transitions from state i to j) log( (#transitions from state i to j) over ( #transitions from state i * mle for p*_ij)

mle for p*_ij (transition probabilities) is # transitions from i to j/ #total from i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Example: Pseudo-random numbers

Sequence of random digits produced by a computer’s random number generator: for example
5,3,1,8,7,9,0,6,4,5,…
Are they independent, or is there a tendency for certain digits to follow others?

A

Given data for 200 random integers between 1 and 10, generated in R:

for our transition matrix we want 1/10 for probabilities of next states

we count numbers of transitions of all possible pairs of states for given data set

mle for each entry
: for p_{1,1} = number observed 1 to 1 transitions /sum of all entries in top row
= 1/15

(find MLE for transition matrix by number of transitions for i to j divided by total number transitions from i)

we test:
H_0: 10 by 10 TM is entries of 1/10??

Alternative hypothesis: TM is not all entries 1/10

With these data we obtain W = 98.308.

(expected value would be 90, our degrees of freedom )

1-pchisq(98.308,90) in R gives 0.25
So no evidence against H_0

we have degrees of freedom = (100 entries -10 determined by others) = 90

*some numbers in mle TM were 0 treat 0log0 as 0.

  • In the example above, if we were given the total number, ni. , of digits i seen in the data, then, under H_0, we would expect to see ni.p∗ ij = ni./10 transitions i j. The observed number is nij, so W is a comparison of observed and expected frequencies
    Writing ni.p∗ ij = eij,
    we can express W as
    W = 2Σ_{ i,j}[ nij log(nij/eij) =
    −2Σ_{ i,j} nij log(1− ((nij −eij)/ nij)) ,

This gives numerical values close to those from the Pearson X2 statistic for comparing observed and expected frequencies
X^2 =Σ_{ i,j} (nij −eij)^2/eij .
In fact expansion of the log term shows asymptotic equivalence of the two statistics in large samples.

The example above uses a W test to check for independence, but it can be used in the same way to test any other simple hypothesis about the transition probabilities P.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Goodness of fit: testing independence

COMPOSITE HYPOTHESIS

A

Earlier, we wished to test independence against the alternative of Markov dependence in a model for the Snoqualmie Wet/Dry data.

(wanted pearson but the pairs overlapped)

We now have a general approach.

The difference from the last example is that H0, the hypothesis of independence, is now composite: it does not specify the value of P completely, only that rows of P are identical

null hypothesis is: pij = qj for each i, where {qj} is a probability distribution on the state space S.

ie each day is wet or dry w/o reference to previous day
ie rows are identical for P, entries in each row only dep on column

alternate hypothesis is P transition matrix P=
[α 1-α]
[β 1-β]

under H_0: P is determined by q=s-1 parameters.

Without restriction P is determined by s(s-1) parameters.

TEST STATISTIC for testing composite hypotheses:
W = −2(l( ˜ P)−l( ˆ P))

(we replace P*, specific H_0 value, with P~ restricted MLE assuming H_0 is true)

(d.o.f is difference between)
χ2 distribution on d−s−q = s(s−1)−(s−1) = (s−1)^2 degrees of freedom.

RESTRICTED MLEs ˜ pij (under H_0):

log likelihood is
l =Σ_{i,j} [nij logpij =
Σ_{i,j} [nij logqj] =
Σ_ j [n.j logqj]

so that ˜ pij = ˆ qj = n.j/n…

TEST STATISTIC:

W = 2[ Σ_{ i,j} [nij log(nij/ni.) −Σ_{j} [n.j log(n.j/n..)]]
 

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Example 18. Snoqualmie Falls (test completed) USING

COMPOSITE HYPOTHESIS

A

From Table the transition counts from 36 Januaries were
Dry Wet
Dry 186 123
Wet 128 643

  • assume observations in different years indep and argue conditionally on the state on Jan 1 each year
  • The log likelihoods under the two hypotheses are then sums of log likelihoods arising from the separate years, each of the form above, and it is easily verified that they reduce to the above forms on summation. We can therefore treat the problem as if the data had arisen from a single realization. It is found that w_obs = 193.5, which in comparison with the quantiles of χ^2_1

(99%, 99.9% and 99.99% quantiles 6.6,10.8,15.1)

gives emphatic confirmation of our earlier findings with partial data, that an independence model is a very poor description of the data.

As in section 4.6, the log likelihood ratio test statistic W is close in value to the Pearson X^2 statistic for comparing observed frequencies with those expected under H0. The observed value of X2 for the data above is 201.2

17
Q

EXAMPLE 19:

Russian linguistics (Guttorp after Markov)

A

Modelling sounds as MC

        Vowel next Consonant next  Vowel          1106            7532  Consonant 7533            3829

If we model the sequence of vowels and consonants as a Markov chain with state space {V,C}, then this gives the maximum likelihood estimate for our transition matrix as
[0.128 0.872]
[0.663 0.337]

Using the markov chain model:

P=
[1-p p]
[ p 1-p]

Under hypothesis that P is of this form:

log likelihood reduces to
l = (n_{00} + n_{11})log(1−p) + (n_{01} + n{10})logp

which is maximised for mle
p = (n_{01} + n_{10})/n.

1 degree of freedom:

P of P is obtained by substitution in our model for P.

The unrestricted estimates of pij are ˆ pij = nij/ni. as usual.

Thus the likelihood ratio statistic W = −2(l( ˜ P)−l( ˆ P) can be found.

Its value for the data above turns out to be 1217.7. The degrees of freedom for the null χ2 distribution are d−s−q = 2(2−1)−1 = 1 since the model has dimension 1. The large value of w_obs is extremely strong evidence against the tentative model.

Inspection of the data suggests that there is a stronger tendency for consonants to follow consonants than for vowels to follow vowels.

18
Q

Comparing models: homogeneity

2 different sources, is the same TM appropriate for modelling the data from each source or do we need different TMs

A

Suppose we have data x = (x0,…,xn) from source 1 and data y = (y0,…,y_m) from source 2,

(different lengths of data poss from each source)

where x and y are realizations from Markov chains with transition matrices P_x and P_y respectively.

We wish to test H_0 : Px = Py = P
alternative :
P_x and P_y are different

we can use the W test statistic.
Under H0, assuming independence of the two sources, the log likelihood is
l =Σ_{ij} (nx_{ij} + n_{yij} )log(pij )

(n_{xij}= # transitions from i to j in x data)

so that the maximum likelihood estimates are

p~{ij} = (n{xij} + n_{yij})/(n_{xi.} + n_{yi.}).

(#number transitions from i to j in both data sets)/ (#transitions from i in both data sets)

The unrestricted estimates (mle) of Px,Py, each based on its own sample of data , are of the standard form.

Thus W may be calculated :

Its null distribution is approximately χ2 with degrees of freedom 2(s^2 −s)−(s^2 −s) = s(s−1).
(degrees of freedom for one TM is s^2-s, for alternative is 2*(s^2 - s )

19
Q

Example 20. Snoqualmie wet/dry

Is there evidence of changes through time in the Markov chain model for Wet/Dry? Transition counts for the years 1948–1965 and 1966-1983 were:

A

Divide data set into 2

Januaries 1948–1965 
      Dry Wet 
Dry 86 63 
Wet 66 325
Januaries 1966–1983 
        Dry Wet 
Dry 100 60 
Wet 62 318

Comparing models: homogeneity

The marginally increased rate Dry → Dry and reduced rate Wet → Wet might suggest that conditions in later years were systematically less wet. The observed value of the W statistic however is w_obs = 0.78, giving no evidence when compared to χ2 2 values of a difference in transition matrices. A conclusion is that the observed discrepancy is within the bounds of random variation within a common model

20
Q

DEF 16:

rth order Markov chain

A

A stochastic process {X_i} in discrete time with a discrete state space S is
an rth order Markov chain if

P(X_{n+1} = x_{n+1} | X_n = x_n, X_{n−1} = xn−1, . . . , X0 = x0)

= P(X_{n+1} = x_{n+1} | X_n = x_n, Xn−1 = x_{n−1}, . . . , X_{n−r+1} = x_{n−r+1})

*bold(Y_i) = (Xi, Xi−1, . . . , Xi−r+1) on state space S^When (X_i) is rth order M

probabilities of VECTORS

P(Y_{n+1} = y_{n+1} | Y_n = y_n
,Y_{n−1} = y_{n−1}
, . . . , Y_{r−1} = y_{r−1})

= P(Y_{n+1} = y_{n+1} | Y_n= y_n)
so that {Y_i} is first order Markov.

ie conditional on state n and time n-1 and,…… , time n-r dep on order r

21
Q

rth order Markov chain: transition probabilities

A

The one-step transition probabilities for a second order chain {Xi} now depend on the previous two states.
TPs are:
p_{(ij)k} = P(X_2 = k | X_0 = i, X_1 = j)

2 steps of memory, conditional on last 2 not just previous

2 state prev tps= conditioning on the
days 4=W and days 5=W
X_4=W X_5=W

P(X_6=D | X_4=W and X_5=W)
=P((X_5= W, X_6=D ) | X_4=W)

ie looking at pairs of consecutive days we can write 2nd order in terms of first order chain

22
Q

rth order Markov chain:
log likelihood

maximum likelihood estimators

A

corresponding transition counts n_{(ij)k} given starting state pair

(number transitions from (i,j) to k)

l= sum over i,j,k of [n_{(ij)k} log(p_{(ij)k}]

mles
pˆ_{(ij)k} = n_{(ij)k}/n_{(ij)}

23
Q

Given suitable data we can use W to test whether a first order chain is an adequate model:

A

Within the overall second-order model for the chain, the hypothesis that the chain is first
order is
H_0 : p_{(ij)k} = q_{jk} (doesnt dep on i)

and the log-likelihood under H0 becomes

l= sum over i,j,k of [n_{(ij)k} log(p_{(ij)k}]
= sum over j,k of [n_{(.j)k}log(p_{jk}]
= sum over j,k of [n_{jk} log(p_{jk}]

so the maximum log likelihood under the restricted model is
l(P~) = sum over jk of [n_{jk} log(n_{jk}/n_j)]

The maximum without restriction is
l(P^) = sum over ijk of [n_{(ij)k} log(n_{(ij)k}/n_{(ij).}]

e GLRT test statistic W = −2(l(P~) − l(P^)) may be found. Large values of W discredit H0 in favour of second order dependence

The null distribution of W is approximately
χ^2 with degrees of freedom
s^2(s − 1) − s(s − 1) = s(s − 1)^2

since the number of non-zero elements in P is s^3, the size of the state space is s^2 and under H0 there are s^2 − s independent probabilities q_{jk}.

24
Q

EXAMPLE Snoqualmie
wet/dry

W to test whether a first order chain is an adequate model:
data:

Previous days Current day Proportion
2 before 1 before Dry Wet Dry Wet
Wet Wet 100 527 0.159 0.841
Dry Wet 25 94 0.206 0.794
Wet Dry 70 52 0.574 0.426
Dry Dry 109 67 0.619 0.381
A

s=2 so degrees of freedom in W test are s(s-1)^2 =2

Transition count data w_obs =0.29
p-value is 0.87
so no reason at all to question the first order Markov dependence
model.

25
Q

Further rain models 1

A
* extend the wet/dry modelling to other months it is natural to ask whether they too could be modelled with Markov chains, possibly with different transition matrices
in different months or groups of months. If so, a Markov chain model in which
the transition matrix changes over time might be considered. The change might
be discontinuous (at month-ends, say) or be made smooth by interpolating between
transition matrices estimated separately for different months. In either case the result
would be a time-heterogeneous chain, but one simple to simulate from.
26
Q

Further rain models 2

A

If Ri
is the rainfall on day i, then Ri = 0 if i is dry. A very simple model for the positive
Ri would be to take them to be independent and from a fixed distribution. Checks on
such a model could be based informally on the semi-variogram (for independence) and on tests for equality of distribution (for the assumption of stationarity), all applied
to observed values of rainfall. It seems possible that the distribution of daily rainfall
amount might vary over the year. If so, then it may be possible to represent the
variation through simple forms of time-varying parameters in a standard distribution, say a gamma distribution. Again, such a model would be easy to simulate from.

27
Q

Further rain models 3

A

Two general strategies for checking the adequacy of a model are (i) to compare the theoretical value of some feature(s) of the model with data, and (ii) to embed the model in a broader one and to test the hypothesis that the data are generated from the smaller model. In (i) it is desirable that the features should not have been used directly in the fitting of the model. A frequent difficulty is the calculation of the theoretical values of model features. Simulation may be useful when analytical approaches are
not available. The same difficulty can stand in the way of strategy (ii) and again
simulation methods may offer a way round it.

28
Q

Further rain models 4

A

The rainfall model emerging from (1) and (2) above is unsatisfactory in that it makes minimal use of knowledge of the weather. Rainfall is associated with the passage of
weather systems over the observation site. Different weather systems have different propensities to generate rain at the site. We might model this by assuming a random succession of a small number of weather types over the site, each having different
probabilities of causing rain. Observations, as before, are the presence or absence of rain on successive days. If the succession of weather types is modelled by a Markov
chain whose states are not recorded, the resulting process is called a hidden Markov model (HMM). It is an example of a state space model. We might assume in addition different distributions for the amount of rain falling on the wet days generated through
the basic HMM, obtaining a more flexible overall rainfall model than before. This idea generalizes to models for several sites over a region

29
Q

Approximate confidence intervals for pij

A

Given numbers of state transitions we can estimate p_ij

p_ij ^ = proportion out of total out of state I

p_ij^ has an approximate normal distribution with mean (p_ij) and variance p_ij(1-p_ij)/N_i

Var p_ij^
= p_ij (1-p_ij) /N_i.

Cov(p_ij^, p_ik^)
= -p_ij p_ik/N_i
For j not equal to k

Each row estimates are indep

Difference between two: example

30
Q

Example 16
Snoqualmie falls

Estimates transition matrix

Standard errors

A

[0.602 0.398]
[0.166 0.833]

n_S =309
n_W =771

Estimates standard errors are therefore
ese( p^_DD) = sqrt ( p^_DD( 1-p^_DD)/n_D) =0.028

ese(p^WW) = sqrt ( p^_WW( 1-p^_WW)/n_W)= 0.013

Giving eg 95% approx intervals

(0. 55,0.66)
(0. 81,0.86)

31
Q

Mles for transition matrices

Difference between

A

For the same rows:
V
For different rows:
Covariance =0

So

Var (p^_22 - p^_33) = var(p^_22) + var(p^33)

And ese? CI value used found from(sqrt(var))