All Notes Flashcards
What two words sum up Monte Carlo?
Random Simulation
What is Monte Carlo?
Studying the behaviour of a random system by stimulating the outcome many times rather than by applying math theory.
What is a fundamental building block for everything in Monte Carlo?
Assuming we have an inexchaustible supply of independent random values which are uniformly distributed on (0,1)
What are three properties of the uniformly distrubtued independent random values we use in Monte Carlo?

What is the c.d.f of U?

Draw the p.d.f of U

Draw the c.d.f of U

What is the algortithm for simulating a fair coin toss?
- Take one U
- The coin toss outcome is
- Head if U ≤ ½
- Tails if U > ½
Describe the Bernoulli distribution.
The Bernoulli distribution with probability p of “success” has two possible outcomes 0 and 1, and probability of 1 is p ∈ [0,1].
What is the algorithm for simulating the Bernoulli distribution?
- Take one U
- Set B =
- 1 if u ≤ p
- 0 if u > p
Prove that the algorithm for simulating the Bernoulli distribution is the following.
- Take one U
- Set B =
- 1 if u ≤ p
- 0 if u > p

How do you simulate from any discrete distribution?
* Let Y be a discrete random variable with possible values x1, …, xm (possibly no finite m)
- ℙ[Y = xi] = pi with pi ≥ 0 and ∑pi = 1
- Set Qj = ∑i=1jpi for j = 1, 2, …., m such that
- Qo ≤ Q1 ≤ Q2 ≤ ….
- Define Q0 = 0, and note than pi = Qi - Qi-1 for all i and that Qm = 1 when m is finite
- And use the following algorithm:
- Take one U
- Set Y٭= xi if Qj-1 < u < Qi
What is the algorithm for simulating any discrete distribution?
- Take one U
- Set Y٭= xi if Qj-1 < u < Qi
Prove that the algorithm for simulating any discrete distribution is:
- Take one U
- Set Y٭= xi if Qj-1 < u < Qi

Why do we distinguish Y and Y٭?
- Y٭ is the value from MC sampling
- Y may be a different “real world” quantity
What are the two ways to simulatie the binomial distribution?
- Binomal is discrete so apply the discrete algorithm
- Each trial corresponds to n Bernoulli random varibles so use the Bernoulli
Describe how you can use the Bernoulli random variable to simulate a Binomial trial.
- Use the Bernoulli algortithm (bern(p)) to generate B1, …, Bn
- Set Y = B1 + … + Bn (couting how many 1’s there are)
- Y is the number of successes in n independent trials with probability p of sucess in each trial
What are three properties of a c.d.f F(x)?

What is the cdf for a discrete distribution with possible values x1 < x2 < … < xm with corresponding probabilites p1, …, pm?

What is the c.d.f for an absolutely continuous distribution?

What is the definition of the generalised inverse of a c.d.f.?
For any c.d.f F, define F-1:(0,1) ➝ ℝ by
F-1(u) = min {x∈ℝ: F(x) ≥ u}
“The lowest value of x for which F(x) is at least u”
Why do we need a generalsied definition for the inverse of cdf?
Because the true inverse does not always exist
Finish this theorm: For any c.d.f F, any u ∈ (0,1) and any y ∈ ℝ:
F-1(u) ≤ y ⟺ … ?
F-1(u) ≤ y ⟺ F(y) ≥ u
Prove this theorem: For any c.d.f F, any u ∈ (0,1) and any y ∈ ℝ:
F-1(u) ≤ y ⟺ F(y) ≥ u.

What is the algorithm to sample from X with cdf FX using the inverse transform method?
- Take one U
- Set x٭ = FX-1(u)
Finish this theorm about the inverse transform method: The cdf of X٭ is … ?
FX
Prove the following theorem: The cdf of X٭ is FX.

What is the algorithm for sampling from Exp(λ) using the inverse transform method?
- Take one U
- Set X = -1/λ log(u)
How do you find the inverse transform function?
- Change x in the cdf to x٭
- Set the cdf equal to U
- Rearrange to make equal to x٭
Give two reasons why you might not want to use the inverse transform method?
- Difficult
- Computationally expensive to apply
In what scenario do yo use the acceptance-rejecting?
We want to sample from a pdf f(x) and we have another pdf h(x) for which we already know how to sample from h and ∃ c ∈ ℝ such that ch(x) ≥ f(x) ∀ x ∈ ℝ.
What is the algorithm for sampling using the acceptance-rejction method?

Draw a picture of the two pdfs for the acceptance-rejection method.

Finish this theorem: the pdf of x from the acceptance-rejection algorithm is .. ?
f
Prove the following theorem: The pdf of x from the acceptance-rejection algorithm is f.

In the acceptane-rejection sampling what is 1/c known as?
The “acceptance rate”.
What s the P[accept X∼] in acceptance-rejection sampling?

How can we approximate the 𝔼[Y] of a random variable Y?
- Take a sample Y1, …., YN of random values of Y
- Estimate 𝔼[Y] by Ȳ = (Y1 + … + YN)/N
What is the standard deviation of Ȳ?
σ2/N where σ2 = Var[Y]
What is the estimation of the standard deviation of Ȳ known as?
The standard error
What is the formula for the standard error of Ȳ?

What is an approximate 95% confidence interval for 𝔼[Y]?

What is meant by a 95% confidence interval?
There is a 95% chance that the interval will contain 𝔼[Y]
What do you do the find 𝔼[h(Y)] where h is any function of Y?
- Compute the values h(Y1), …., h(YN)
- Use the same formulas as with Y1, … , YN to then calculate 𝔼[h(Y)] and any other statistics
What is the indicator function?

If h(Y) is an indicator function of some set A os possible values of Y. What is the 𝔼[Y] equal to?
𝔼[Y] = ℙ[Y ∈ A]
What is a shorthand way to write ℙ[Y ∈ A]?
pA
If h(Y) is an indicator function what is Var[h(Y)]?
Var[h(y)] = pA(1 - pA)
What is the following equal to?


What is the standard error of


What is the 95% confidence interval for


Given the matrix below how do we find a column vector Y?


What is the pth quantile of continuous Y?
F-1(p)
How does the basic Monte Carlo method enable us to estiamte F(x) for any value of x?

How do we estiamte quantiles within samples?
- Sort the sample Y1, …., YN so that the values are increasing
- Label the results as Y(1), …., Y(N)

What are the two steps in conditional specification?
- specify distribution for x1
- specify conditonal distribution for x2|x1 = x
How do you simulate when the two variables x1 and x2 are dependent?
- generate a value for x1
- once x1 is known we have a distribution for x2 and we sample from it
Name one example of when you use a conditional simulation.
Customers arriving in a shop, spending different amounts of time in there. Then the toal time spent by customers in day.
What is a bivariate normal distribution?
When you let z1 and z2 be iid N(0,1), then you have the following. And X is said to have a bivariate normal distribution.

Given the following, what is the distribution of X1 and X2?

X1 ∼ N(μ1, a112 + a122)
X2 ∼ N(μ2, a212 + a222)
Given the following, prove what the covariance of X1 and X2 is?

Cov[X1, X2] = a11a21Var[Z1] + a12a22Var[Z2] + (a11a22 + a12a21)Cov[Z1, Z2] = a11a22 + a12a21 = Cov[X2, X1]
What symbol represents the variance matrix?
Σ
What does Σ represent?
The variance matrix
Define the variance matrix, Σ.

What is the theorem about the pdf of X when it has a bivariate normal distribution?

Finish the following Lemma.


Prove the following Lemma.


Prove the following theorem.


What are the two main steps in sampling from a bivariate normal with mean μ and variance Σ.
- Find A such that Σ = AAT (easy to find lower triangular A)
- Then carry out the Algorithm
What is the algorithm for sampling from the bivariate normal distribution?
- Sample z1, z2 independently from N(0,1)
- Set X = μ + Az
How does Cov[X1, X2] and correlation relate?
- Cov[X1, X2] = σ1σ2ρ
- Where σ1 is the standard deviation of X1
- σ2 is the standard deviation of X2
- ρ the correlation between X1 and X2
Define a copula.
A copula is a joint distribution where the marginal distribution of each random variable is U(0,1).
Define a probability integral transform.
Let X be a random variable with c.d.f. F. Then the r.v. F(x) is called the probability integral transform (P.I.T.) of x.
What is the theorem about the probability integral transform?
If X is a continuous r.v. with cdf F, then the PIT F(x) has a U(0,1) distribution.
Prove the following theorem.
If X is a continuous r.v. with cdf F, then the PIT F(x) has a U(0,1) distribution.

How do you calculate correlation, ρ from a variance matrix?
ρ = Σ12/ (Σ11Σ12)1/2
What do you set Ũ1 and Ũ2 equal to in the Gaussian coupula? And how are they distributed?

What is the algorithm for sampling from a copula?
- sample (Ũ1, Ũ2)
- Set Y1٭ = F1(Ũ1), Y2٭ = F2(Ũ2)
What is a good use of copulas?
To sample from two different distributions but where the value of one is dependent on the other.
When is time dependence easy to handle?
If time is discrete
Why do we use Markov models for time dependence models?
Potential complexity grows as t increases, so by making what happens at time t depend only on the value of Y(t-1) and not on Y(t-2) it makes the problem easier
In a Markov model for time dependece what do you have to specifcy?
Y(t + 1)|Y(t) = y(t) for all t of interest and all values of y(t)
If Y(t + 1)|Y(t) = y in a Markov model, what is the model called? And what does it mean?
Stationary Markov chain - only depends on y and not t
Give two examples of a ‘stopping rule’?
- Stop at some fixed time T
- Or based on the value of Y(t)
How do you simulate one monte carlo sample that is time dependent?
- Generate value for Y(0) from its distribution
- For t = 1, 2, … until stopping rules applies generate Y(t + 1) from distribution Y(t + 1)|Y(t), …, Y(0)
- Rpeat steps (1) and (2) N times and apply basic mc principle to the result
What two types of time are they when you look at time dependence?
- Discrete
- Continuous
If we are looking at a model with contnuous time what two things are we interested in?
If 0 ≤ T1 ≤ T2 ≤ … then we are interested in the
- Interarrival times: Xj = Tj - Tj-1
- The number of events N(t) before some time t if Tj ≤ t ≤ Tj+1 then N(t) = j
What is the name of the two models that model continuous time dependence?
- Renewal Process
- Poisson Process
Describe the renewal process.
Each X1, X2, … (the inter-event times) are iid with some cdf F. So simulation is easy as long as we can sample from F. Then we just sample X1, X2, … independently from F and compute T1 = X1, T2 = T1 + X2, T3 = T2 + X3, …
Describe the homogeneous Poisson process with rate λ > 0.

How is N(t) distributed in the homogeneous Poisson process?
N(t) ∼ Poisson(λt)
What are the three kind of events in a single server, single arrival queue system?
- Arrivals - time between sucessive arrivals is Exp(λ)
- Service starts - happens when there is one or more in the queue and the server is “idle”
- Service ends - happens at a random time drawn from Fs, after a service start event - in between start and end the server is “busy”
What is the mathematical model for the arrival times?
Poisson process
If we don’t use the Poisson process to model the arrival times what can we use?
Could repalce with any cdf FA and sample X ∼ FA insread of X∼Exp(λ)