CT 4 Random Flashcards

Revision

1
Q

A Markov Chain with state space {A, B, C} has the following properties:

  • it is irreducible
  • it is periodic
  • the probability of moving from A to B equals the probability of moving from A to C

(i) Show that these properties uniquely define the process. [4]
(ii) Sketch a transition diagram for the process. [1]

A

4 (i) As periodic and irreducible then all states are periodic, hence probability of staying in any state is zero.

By law of total probability, PAA + PAB + PAC = 1.

But PAB = PAC and PAA = 0 so PAB = PAC = 0.5.

To be irreducible at least one of PBA or PCA must be greater than zero.

If PBA > 0 then to be periodic must have PCB = 0,

and to be irreducible PCA > 0,

and if PCA > 0 then to be periodic must have PBC = 0, and to be irreducible PBA > 0.

So must have PBC = PCB = 0 and PBA = PCA = 1.

(ii)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

5 Ten years ago, a confectionery manufacturer launched a new product, the Scrummy Bar. The product has been successful, with a rapid increase in consumption since the product was first sold. In order to plan future investment in production capacity, the manufacturer wishes to forecast the future demand for Scrummy Bars. It has data on age-specific consumption rates for the past ten years, together with projections of the population by age over the next twenty years. It proposes the following modelling strategy:

  • extrapolate past age-specific consumption rates to forecast age-specific consumption rates for the next 20 years
  • apply the forecast age-specific consumption rates to the projected population by age to obtain estimated total consumption of the product by age for each of the next 20 years
  • sum the results to obtain the total demand for each year

Describe the advantages and disadvantages of this strategy. [5]

A

5 Advantages

The model is simple to understand and to communicate.

The model takes account of one major source of variation in consumption rates, specifically age.

The model is easy and cheap to implement.

The past data on consumption rates by age are likely to be fairly accurate.

The model can be adapted easily to different projected populations OR takes into account future changes in the population.

Disadvantages

Past trends in consumption by age may not be a good guide to future trends.

Extrapolation of past age-specific consumption rates may be complex or difficult and can be done in different ways.

Consumption of chocolate may be affected by the state of the economy,


e.g. whether there is a recession.

Factors other than age may be important in determining consumption,
e.g. expenditure on advertising.

Consumption may be sensitive to pricing, which may change in the future.

A rapid increase in consumption rates is unlikely to be sustained


for a long period as there is likely to be an upper limit to the amount of Scrummy Bars a person can eat.

The projections of the future population by age may not be accurate, as they depend on future fertility, mortality and migration rates.

The proposed strategy does not include any testing of the sensitivity of total demand to changes in the projected population, or variations in future consumption trends from that used in the model.

Unforeseen events such as competitors launching new products, or the nation becoming increasingly health-aware, may affect future consumption.

The consumption of Scrummy Bars may vary with cohort rather than age, and the model does not capture cohort effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A certain profession admits new members to the status of student. Students may qualify as fellows of the profession by virtue of passing a series of examinations.
Normally student members sit the examinations whilst working for an employer. There are two sessions of the examinations each year.

An employer provides study support to student members of the profession. It wishes to assess the cost of providing this study support and therefore wishes to know the average time it can expect to take for its students to qualify.

The employer has maintained records for 23 of its students who all sat their first examination in the first session of 2003. The students’ progress has been recorded up to and including the last session of 2009. The following data records the number of sessions which had been held before the specified event occurred for a student in this cohort:

Qualified 6, 8, 8, 9, 9, 9, 11, 11, 13, 13, 13
Stopped studying 4, 5, 8, 11, 14

The remaining seven students were still studying for the examinations at the end of 2009.

(i) Determine the median number of sessions taken to qualify for those students who qualified during the period of observation. [2]

(ii) Calculate the Kaplan-Meier estimate of the survival function, S(t), for the “hazard” of qualifying, where t is the number of sessions of examinations
since 1 January 2003. [5]

(iii) Hence estimate the median number of sessions to qualify for the students of
this employer. [2]

(iv) Explain the difference between the results in (i) and (iii) above. [2]

A

11 students qualified during the period of observation, so the median is the number of sessions taken to qualify by the sixth student to qualify.

This is 9 sessions.

(ii) Define t as the number of sessions which have taken place since 1 Jan 2003.

Stopped studying implies recorded after the session number reported. 	 

tj Nj Dj Cj Dj/Nj 1-Dj/N j

0 23 0 2 – 1
6 21 1 0 1/21 20/21
8 20 2 1 2/20 18/20
9 17 3 0 3/17 14/17
11 14 2 1 2/14 12/14
13 11 3 0 3/11 8/11

The Kaplan-Meier estimate is given by product of\ 1− Dj/N j

Then the Kaplan-Meier estimate of the survival function is

t S t( )

0≤ t < 6 1 6≤ t < 8 0.9524
8≤ t < 9 0.8571
9≤ t < 11 0.7059
11≤ t < 13 0.605
13≤ t < 14 0.4400
(iii) The median time to qualify as estimated by the Kaplan-Meier estimate
is the first time at which S(t) is below 0.5.
Therefore the estimate is 13 sessions.
(iv) The estimate based on students qualifying during the period is a biased estimate because it does not contain information about students still studying at the end of the period, or about those who dropped out (stopped studying without qualifying).

The students still studying at the end of 2009 have (by definition) a longer period to qualification than those who qualified in the period.

Hence the Kaplan-Meier estimate is higher than the median using only students who qualified during the period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Write down the hazard function for the Cox proportional hazards
model defining all the terms that you use. [2]

A farmer is concerned that he is losing a lot of his birds to a predator, so he decides to build a new enclosure using taller fencing. This fencing is expensive and he cannot afford to build a large enough area for all his birds. He therefore decides to put half his birds in the new enclosure and leave the others in the existing enclosure. He is convinced that the new enclosure is an improvement, but has asked an actuarial student to determine whether the new enclosure will result in an increase in the life expectancy of his birds. The student has fitted a Cox proportional hazards model to data on the duration until a bird is killed by a predator and calculated the following figures relating to the regression parameters:

Parameter estimate Variance

Bird 	

     Chicken 	0 	0 
 Duck 	–0.210 	0.002 
 Goose 	 0.075 	0.004 

Enclosure 
New 	 0.125 	0.0015 
 Old 	     0 	0 

Sex 	''''
    Male 	0.2 	0.0026 
Female 	0 	0 

(ii) State the features of the bird to which the baseline hazard applies. [1]
(iii) For each regression parameter:
(a) Define the associated covariate.
(b) Calculate the 95% confidence interval based on the standard error. [3]

(iv) Comment on the farmer’s belief that the new enclosure will result in an
increase in his birds’ life expectancy. [2]

(v) Calculate, using this model, the probability that a female duck in the new enclosure has been killed by a predator at the end of six months, given that the
probability that a male goose in the old enclosure has been killed at the end of
the same period is 0.1 (all other decrements can be ignored). [4]

A

h(z, t) = h0(t) exp (βziT)

h(z,t) is the hazard at time t (or just h(t) is OK)
h0(t) is the baseline hazard

zi are covariates
β is a vector of regression parameters

(ii) The baseline hazard refers to a female chicken in the old enclosure
(iii) The 95 per cent confidence interval for a parameter β is given by the formula

β±1.96(SE[ ])β =β±1.96 Var( )β ,

where SE[β] is the standard error of the parameter β.

Thus, for the covariate z1 =1 if Duck 0 otherwise, we have

95 per cent confidence interval =
−0.210±1.96 0.002 =−0.210±0.088 = −{ 0.298, 0.122}−

95% C.I.
z1 = 1 ,if Duck
0 otherwise
β1 = (–0.298, -0.122)

z2 = 1, if Goose
0, otherwise
β2 = (–0.049, 0.199)

z3 = 1, if New enclosure
0, otherwise
β3 = (0.049, 0.201)

z4 = 1 if Male
0 otherwise
β4 = (0.100, 0.300)

(iv) The parameter for the new enclosure is 0.125 so the ratio of the hazard for two otherwise identical birds is
exp(0.125) = 1.133.

So the hazard appears to have got worse.

The 95% confidence interval is entirely positive OR does not include zero

so at the 95% level the deterioration is statistically significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain why it is important to sub-divide data when carrying out mortality investigations.

A

The models of mortality we use assume that we can observe a group of lives with the same mortality characteristics. This is not possible in practice.

However, data can be subdivided according to certain characteristics that we know to have a significant effect on mortality.

This will reduce the heterogeneity of each group, so that we can at least observe groups with similar, but not the same characteristiics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the problems that can arise with sub-dividing life assurance data.

A

Sub-dividing data using many factors can result in the number in each class being being too low.

It is necessary to strike a balance between homogeneity of the group and retaining a large enough group to make statistical analysis possible.

Sufficient data may not be collected to allow sub-division.

This may be because marketing pressures mean proposals forms are kept to a minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Factors used to subdivide life assurance data

A
Sex
Age
Type of policy
Smoker/Non-smoker status
Level of underwritting
Duration in force
Sales channel
Policy size
Occupation ( or social class) of policyholder
Known impairments
Geographical region
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Principle of correspondence

A

The principle of correspondence states that a life alive at time t should be included in the exposure at age x at time t if and only if were that life to die immediately, he or she would be counted in the deaths data (theta x) at age x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the difference between the central exposed to risk and the initial exposed to risk.

A

The central exposed to risk at age x, is the observed waiting time in a multiple-state or a Poisson model. It is the sum of of the times spent under observation by each life at age x.

The initial exposed to risk requires adjustments for those lives who dies whom we continue observing until the end of the rate interval.

Initial exposed to risk may be approximated as….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly