Section 7 - Tutorial R and In-Class Questions R Flashcards

Question 1

Q

Q10 A researcher is studying survival for patients with a rare autoimmune disease. The probability of death due to the disease is given by 𝑝, which is assumed to have a
beta distribution with parameters 5 and 8.
In order to estimate 𝑝, the researcher analyses a sample of 𝑛 = 20 people.
The Bayesian estimate for 𝑝 under quadratic loss, given 𝑥 deaths in a sample size of 𝑛 is:
𝑥 + 𝑎
𝑎 + 𝑏 + 𝑛
i) Obtain 1,000 simulations of the posterior probability of death, based on
1,000 samples each of 20 people. Hence, obtain an average empirical estimate for 𝑝 under quadratic loss.
Use the function set.seed(31) so that your answer is reproducible.
The Bayesian estimate for 𝑝 under all-or-nothing loss, given 𝑥 deaths in a sample
size of 𝑛 is:
𝑥 + 𝑎 − 1
𝑎 + 𝑏 + 𝑛 − 2
ii) Obtain 1,000 simulations of the posterior probability of death, based on
1,000 samples each of 20 people. Hence, obtain an average empirical estimate for 𝑝 under all-or-nothing loss.
Use the function set.seed(31) so that your answer is reproducible.
The Bayesian estimate for 𝑝 under absolute loss, given 𝑥 deaths in a sample size of 𝑛 is the median of the posterior distribution 𝐵𝑒𝑡𝑎(𝑥 + 𝑎, 𝑛 − 𝑥 + 𝑏).
iii) Obtain 1,000 simulations of the posterior probability of death, based on
1,000 samples each of 20 people. Hence, obtain an average empirical estimate for 𝑝 under absolute loss.
Use the function set.seed(31) so that your answer is reproducible.

Answer

A

Q10
i) 0.39118
ii) 0.38416
iii) 0.38896
Or, using alternative method:
i) 0.38518
ii) 0.37777
iii) 0.38284

#######################################################################
####### Section 7 tutorial #######
#######################################################################

Question 10

define parameters

sample size n=20
# prior distribution beta(a,b) = beta(5,8)

n <- 20
a <- 5
b <- 8

(i) Quadratic loss

postmean <- rep(0,1000)

set.seed(31)

for (i in 1:1000)
{p <- rbeta(1,a,b)
x <- rbinom(1,n,p)
postmean[i] <- (x+a)/(a+b+n)}

mean(postmean)

(ii) 0-1 loss

postmode <- rep(0,1000)

set.seed(31)

for (i in 1:1000)
{p <- rbeta(1,a,b)
x <- rbinom(1,n,p)
postmode[i] <- (x+a-1)/(a+b+n-2)}

mean(postmode)

(ii) absolute loss

postmed <- rep(0,1000)

set.seed(31)

for (i in 1:1000)
{p <- rbeta(1,a,b)
x <- rbinom(1,n,p)
postmed[i] <- qbeta(0.5,x+a,n-x+b)}

mean(postmed)

###########################################################################

Alternative method
### using rbinom(n,1,p) and sum(x)

altpostmean <- rep(0,1000)

set.seed(31)

for (i in 1:1000)
{p <- rbeta(1,a,b)
x <- rbinom(n,1,p)
altpostmean[i] <- (sum(x)+a)/(a+b+n)}

mean(altpostmean)

(ii) 0-1 loss

altpostmode <- rep(0,1000)

set.seed(31)

for (i in 1:1000)
{p <- rbeta(1,a,b)
x <- rbinom(n,1,p)
altpostmode[i] <- (sum(x)+a-1)/(a+b+n-2)}

mean(altpostmode)

(ii) absolute loss

altpostmed <- rep(0,1000)

set.seed(31)

for (i in 1:1000)
{p <- rbeta(1,a,b)
x <- rbinom(n,1,p)
altpostmed[i] <- qbeta(0.5,sum(x)+a,n-sum(x)+b)}

mean(altpostmed)

Question 2

Q

Exam standard question: CS1 B April 2019 Q2
Q11 Consider the 𝑛 = 30 independent and identically distributed observations
(𝑦1, 𝑦2, … , 𝑦𝑛) given below from a random variable 𝑌 with probability distribution
function 𝑓(𝑦, 𝜃) = 𝜃^𝑦*𝑒^−𝜃/𝑦! .
You can enter the 𝑦 values into R by using:
y=c(5,5,6,2,4,10,2,5,5,2,5,3,7,4,4,5,4,6,7,2,8,4,6,4,3,6,6,6,5,7)
By assuming a prior distribution proportional to 𝑒−𝛼𝜃, we can show that the
posterior distribution of 𝜃 is:
𝑓(𝜃|𝑦1, 𝑦2, … , 𝑦𝑛) ∝ 𝜃^∑𝑦𝑖 *𝑒^−(𝑛+𝛼)𝜃
We can observe that the posterior distribution of 𝜃 is Gamma with parameters
𝑛,𝑖=1∑ 𝑦,𝑖 − 1 and 𝑛 + 𝛼.
i)
a) Plot the posterior probability density function of 𝜃 for values of 𝜃 in
the interval [3.2, 6.8] and assuming 𝛼 = 0.01.
Hint: Consult your FIN2017 notes for a reminder on plotting density
functions.
The range of values of 𝜃 can be obtained in R by seq(3.2, 6.8,by=0.01).
b) Carry out a simulation of 𝑁 = 5,000 posterior samples for the parameter 𝜃.
ii) Plot the histogram of the posterior distribution of 𝜃 using your simulated
values.
iii) Calculate the mean, median and standard deviation of your simulated
values of 𝜃.
Two possible values for the true value of the parameter 𝜃 are 𝜃 = 15 and 𝜃 = 5.
iv) Comment on these two values based on your answers in parts (ii) and (iii).

Answer

A

Q11
iii) Mean: 4.8988
Median: 4.8814
Standard deviation: 0.4065
Note, no seed set so answers will vary
15 is quite far away from the range of samples obtained for the posterior
distribution of θ. [1]
On the other hand 5 is more likely to be the true value. [1]
15 is very unlikely to be the case if there is no calculation error.

Question 3

Q

Exam standard question – IfoA Curriculum 2019 CS1 Sample Paper
Q12 A Bayesian credibility model is used to model annual claim numbers, denoted by 𝑋, for the coming year. These are assumed to have a Poisson distribution with mean 𝜆, where 𝜆 itself is modelled by a gamma distribution with parameters 𝛼 = 100 and 𝛽 = 1.
(i)
a) Implement 𝑀 = 1000 Monte Carlo repetitions of a credibility analysis to estimate the distribution of the posterior mean of parameter 𝜆 using the credibility factor 𝑍 = 1/( 𝛽 + 1), in the case where the number of past claims 𝑥 is known only for the last one year.
b) Provide the histogram of the 1000 Monte Carlo posterior mean estimates calculated in part (i)(a).
[15]
(ii)
a) Calculate the mean and variance of the Monte Carlo posterior mean estimates from part (i).
b) Compare the Monte Carlo mean and variance obtained in part (ii)(a) with those obtained from samples of size 1,000 drawn from a 𝐺𝑎𝑚𝑚𝑎(𝛼 + 𝑥, 𝛽 + 1) distribution. Round your results to three decimal places.
[12]
(iii) Comment on your findings in parts (i) and (ii).
[3]
[Total 30]

Answer

A

#######################################################################
####### Section 7 tutorial #######
#######################################################################

Question 12 - IFoA Solution

(i)
#(a)

M <- 1000
alpha <- 100
beta <- 1

Z <- 1/(beta+1)

pm <- X <- numeric(M)

for(m in 1:M){
lam = rgamma(1, shape=alpha, rate=beta)
x = rpois(1, lam)
X[m] = x
pm[m] = Zx +(1-Z)alpha/beta
}
help(rgamma)
# (b)

hist(pm, main=”Historgram of posterior means”,
xlab=”Posterior mean”, ylab=”Frequency”)

(ii)

(a)

round(mean(pm),3)
round(var(pm),3)

(b)

mG <- numeric(M)

for(m in 1:M){
lam = rgamma(1, shape=alpha, rate=beta)
x = rpois(1, lam)
y = rgamma(1000, shape = alpha+x, rate = beta+1)
mG[m]=mean(y)
}

round(mean(mG), 3)
round(var(mG),3)

(ii) MC mean and variance of posterior mean estimates: 99.904, 51.167
MC mean and variance of posterior from Gamma samples: 99.900,
51.238
Note that this solution to part (ii) uses a new set of Monte Carlo
repetitions. This is not necessary, and full credit can be given for
combining parts (i) and (ii) in a single exercise. Clearly the precise
numerical values for the means and variances will differ from
implementation to implementation.
(iii) The similarity of the Monte Carlo estimates of the mean and variance and
those from the Gamma(α + 𝑥, β + 1) sample demonstrates that the
posterior Distribution for the Poisson/Gamma credibility model is the
Gamma(α + 𝑥, β + 1).

Section 7 - Tutorial R and In-Class Questions R Flashcards

(3 cards)