All Notes Flashcards by Hollie Pilkington

What two words sum up Monte Carlo?

Random Simulation

How well did you know this?

Not at all

Perfectly

What is Monte Carlo?

Studying the behaviour of a random system by stimulating the outcome many times rather than by applying math theory.

How well did you know this?

Not at all

Perfectly

What is a fundamental building block for everything in Monte Carlo?

Assuming we have an inexchaustible supply of independent random values which are uniformly distributed on (0,1)

How well did you know this?

Not at all

Perfectly

What are three properties of the uniformly distrubtued independent random values we use in Monte Carlo?

How well did you know this?

Not at all

Perfectly

What is the c.d.f of U?

How well did you know this?

Not at all

Perfectly

Draw the p.d.f of U

How well did you know this?

Not at all

Perfectly

Draw the c.d.f of U

How well did you know this?

Not at all

Perfectly

What is the algortithm for simulating a fair coin toss?

Take one U
The coin toss outcome is
1. Head if U ≤ ½
2. Tails if U > ½

How well did you know this?

Not at all

Perfectly

Describe the Bernoulli distribution.

The Bernoulli distribution with probability p of “success” has two possible outcomes 0 and 1, and probability of 1 is p ∈ [0,1].

How well did you know this?

Not at all

Perfectly

What is the algorithm for simulating the Bernoulli distribution?

Take one U
Set B =
1. 1 if u ≤ p
2. 0 if u > p

How well did you know this?

Not at all

Perfectly

Prove that the algorithm for simulating the Bernoulli distribution is the following.

Take one U
Set B =
1. 1 if u ≤ p
2. 0 if u > p

How well did you know this?

Not at all

Perfectly

How do you simulate from any discrete distribution?

* Let Y be a discrete random variable with possible values x₁, …, x_m(possibly no finite m)

ℙ[Y = x_i] = p_i with p_i ≥ 0 and ∑p_i = 1
Set Q_j = ∑_i=1^jp_i for j = 1, 2, …., m such that
Q_o ≤ Q₁ ≤ Q₂ ≤ ….
Define Q₀ = 0, and note than p_i = Q_i - Q_i-1 for all i and that Q_m= 1 when m is finite
And use the following algorithm:

Take one U
Set Y٭= x_i if Q_j-1 < u < Q_i

How well did you know this?

Not at all

Perfectly

What is the algorithm for simulating any discrete distribution?

Take one U
Set Y٭= x_i if Q_j-1 < u < Q_i

How well did you know this?

Not at all

Perfectly

Prove that the algorithm for simulating any discrete distribution is:

Take one U
Set Y٭= x_i if Q_j-1 < u < Q_i

How well did you know this?

Not at all

Perfectly

Why do we distinguish Y and Y٭?

Y٭ is the value from MC sampling
Y may be a different “real world” quantity

How well did you know this?

Not at all

Perfectly

What are the two ways to simulatie the binomial distribution?

Binomal is discrete so apply the discrete algorithm
Each trial corresponds to n Bernoulli random varibles so use the Bernoulli

How well did you know this?

Not at all

Perfectly

Describe how you can use the Bernoulli random variable to simulate a Binomial trial.

Use the Bernoulli algortithm (bern(p)) to generate B₁, …, B_n
Set Y = B₁ + … + B_n(couting how many 1’s there are)
Y is the number of successes in n independent trials with probability p of sucess in each trial

How well did you know this?

Not at all

Perfectly

What are three properties of a c.d.f F(x)?

How well did you know this?

Not at all

Perfectly

What is the cdf for a discrete distribution with possible values x₁ < x₂ < … < x_m with corresponding probabilites p₁, …, p_m?

How well did you know this?

Not at all

Perfectly

What is the c.d.f for an absolutely continuous distribution?

How well did you know this?

Not at all

Perfectly

What is the definition of the generalised inverse of a c.d.f.?

For any c.d.f F, define F^-1:(0,1) ➝ ℝ by

F^-1(u) = min {x∈ℝ: F(x) ≥ u}

“The lowest value of x for which F(x) is at least u”

How well did you know this?

Not at all

Perfectly

Why do we need a generalsied definition for the inverse of cdf?

Because the true inverse does not always exist

How well did you know this?

Not at all

Perfectly

Finish this theorm: For any c.d.f F, any u ∈ (0,1) and any y ∈ ℝ:

F^-1(u) ≤ y ⟺ … ?

F^-1(u) ≤ y ⟺ F(y) ≥ u

How well did you know this?

Not at all

Perfectly

Prove this theorem: For any c.d.f F, any u ∈ (0,1) and any y ∈ ℝ:

F^-1(u) ≤ y ⟺ F(y) ≥ u.

How well did you know this?

Not at all

Perfectly

What is the algorithm to sample from X with cdf F_X using the inverse transform method?

1. Take one U 2. Set x٭ = F_X^-1(u)

Finish this theorm about the inverse transform method: The cdf of X٭ is ... ?

F_X

Prove the following theorem: The cdf of X٭ is F_X.

What is the algorithm for sampling from Exp(λ) using the inverse transform method?

1. Take one U 2. Set X = -1/λ log(u)

How do you find the inverse transform function?

1. Change x in the cdf to x٭ 2. Set the cdf equal to U 3. Rearrange to make equal to x٭

Give two reasons why you might not want to use the inverse transform method?

1. Difficult 2. Computationally expensive to apply

In what scenario do yo use the acceptance-rejecting?

We want to sample from a pdf f(x) and we have another pdf h(x) for which we already know how to sample from h and ∃ c ∈ ℝ such that ch(x) ≥ f(x) ∀ x ∈ ℝ.

What is the algorithm for sampling using the acceptance-rejction method?

Draw a picture of the two pdfs for the acceptance-rejection method.

Finish this theorem: the pdf of x from the acceptance-rejection algorithm is .. ?

Prove the following theorem: The pdf of x from the acceptance-rejection algorithm is f.

In the acceptane-rejection sampling what is 1/c known as?

The "acceptance rate".

What s the P[accept X^∼] in acceptance-rejection sampling?

How can we approximate the 𝔼[Y] of a random variable Y?

1. Take a sample Y₁, ...., Y_Nof random values of Y 2. Estimate 𝔼[Y] by Ȳ = (Y₁ + ... + Y_N)/N

What is the standard deviation of Ȳ?

σ²/N where σ²= Var[Y]

What is the estimation of the standard deviation of Ȳ known as?

The standard error

What is the formula for the standard error of Ȳ?

What is an approximate 95% confidence interval for 𝔼[Y]?

What is meant by a 95% confidence interval?

There is a 95% chance that the interval will contain 𝔼[Y]

What do you do the find 𝔼[h(Y)] where h is any function of Y?

1. Compute the values h(Y₁), ...., h(Y_N) 2. Use the same formulas as with Y₁, ... , Y_N to then calculate 𝔼[h(Y)] and any other statistics

What is the indicator function?

If h(Y) is an indicator function of some set A os possible values of Y. What is the 𝔼[Y] equal to?

𝔼[Y] = ℙ[Y ∈ A]

What is a shorthand way to write ℙ[Y ∈ A]?

p_A

If h(Y) is an indicator function what is Var[h(Y)]?

Var[h(y)] = p_A(1 - p_A)

What is the following equal to?

What is the standard error of

What is the 95% confidence interval for

Given the matrix below how do we find a column vector Y?

What is the p^th quantile of continuous Y?

F^-1(p)

How does the basic Monte Carlo method enable us to estiamte F(x) for any value of x?

How do we estiamte quantiles within samples?

1. Sort the sample Y₁, ...., Y_N so that the values are increasing 2. Label the results as Y₍₁₎, ...., Y₍_N)

What are the two steps in conditional specification?

1. specify distribution for x₁ 2. specify conditonal distribution for x₂|x₁ = x

How do you simulate when the two variables x₁ and x₂ are dependent?

1. generate a value for x₁ 2. once x₁ is known we have a distribution for x₂ and we sample from it

Name one example of when you use a conditional simulation.

Customers arriving in a shop, spending different amounts of time in there. Then the toal time spent by customers in day.

What is a bivariate normal distribution?

When you let z₁ and z₂ be iid N(0,1), then you have the following. And _X_ is said to have a bivariate normal distribution.

Given the following, what is the distribution of X₁ and X₂?

X₁ ∼ N(μ₁, a₁₁² + a₁₂²) X2 ∼ N(μ₂, a₂₁² + a₂₂²)

Given the following, prove what the covariance of X₁ and X₂is?

Cov[X₁, X₂] = a₁₁a₂₁Var[Z₁] + a₁₂a₂₂Var[Z₂] + (a₁₁a₂₂ + a₁₂a₂₁)Cov[Z₁, Z₂] = a₁₁a₂₂ + a₁₂a₂₁ = Cov[X₂, X₁]

What symbol represents the variance matrix?

What does Σ represent?

The variance matrix

Define the variance matrix, Σ.

What is the theorem about the pdf of _X_ when it has a bivariate normal distribution?

Finish the following Lemma.

Prove the following Lemma.

Prove the following theorem.

What are the two main steps in sampling from a bivariate normal with mean _μ_ and variance Σ.

1. Find A such that Σ = AA^T (easy to find lower triangular A) 2. Then carry out the Algorithm

What is the algorithm for sampling from the bivariate normal distribution?

1. Sample z₁, z₂independently from N(0,1) 2. Set _X_ = _μ_ + A**_z_**

How does Cov[X_1,X₂] and correlation relate?

* Cov[X₁, X₂] = σ₁σ₂ρ * Where σ₁ is the standard deviation of X₁ * σ₂ is the standard deviation of X₂ * ρ the correlation between X₁ and X₂

Define a **copula**.

A **copula** is a joint distribution where the marginal distribution of each random variable is U(0,1).

Define a **probability integral transform.**

Let X be a random variable with c.d.f. F. Then the r.v. F(x) is called the **probability integral transform** (P.I.T.) of x.

What is the theorem about the probability integral transform?

If X is a continuous r.v. with cdf F, then the PIT F(x) has a U(0,1) distribution.

Prove the following theorem. If X is a continuous r.v. with cdf F, then the PIT F(x) has a U(0,1) distribution.

How do you calculate correlation, ρ from a variance matrix?

ρ = Σ₁₂/ (Σ₁₁Σ₁₂)^1/2

What do you set Ũ₁ and Ũ₂ equal to in the Gaussian coupula? And how are they distributed?

What is the algorithm for sampling from a copula?

1. sample (Ũ_1,Ũ₂) 2. Set Y₁^٭ = F₁(Ũ₁), Y₂^٭ = F₂(Ũ₂)

What is a good use of copulas?

To sample from two different distributions but where the value of one is dependent on the other.

When is time dependence easy to handle?

If time is discrete

Why do we use Markov models for time dependence models?

Potential complexity grows as t increases, so by making what happens at time t depend only on the value of Y(t-1) and not on Y(t-2) it makes the problem easier

In a Markov model for time dependece what do you have to specifcy?

Y(t + 1)|Y(t) = y(t) for all t of interest and all values of y(t)

If Y(t + 1)|Y(t) = y in a Markov model, what is the model called? And what does it mean?

Stationary Markov chain - only depends on y and not t

Give two examples of a 'stopping rule'?

1. Stop at some fixed time T 2. Or based on the value of Y(t)

How do you simulate one monte carlo sample that is time dependent?

1. Generate value for Y(0) from its distribution 2. For t = 1, 2, ... until stopping rules applies generate Y(t + 1) from distribution Y(t + 1)|Y(t), ..., Y(0) 3. Rpeat steps (1) and (2) N times and apply basic mc principle to the result

What two types of time are they when you look at time dependence?

1. Discrete 2. Continuous

If we are looking at a model with contnuous time what two things are we interested in?

If 0 ≤ T₁ ≤ T₂ ≤ ... then we are interested in the 1. Interarrival times: X_j = T_j - T_j-1 2. The number of events N(t) before some time t if T_j ≤ t ≤ T_j+1then N(t) = j

What is the name of the two models that model continuous time dependence?

1. Renewal Process 2. Poisson Process

Describe the renewal process.

Each X_1, X₂, ... (the inter-event times) are iid with some cdf F. So simulation is easy as long as we can sample from F. Then we just sample X_1, X₂, ... independently from F and compute T₁ = X₁, T₂ = T₁ + X₂, T₃ = T₂ + X_{3, ...}

Describe the **homogeneous Poisson process** with rate λ \> 0.

How is N(t) distributed in the homogeneous Poisson process?

N(t) ∼ Poisson(λt)

What are the three kind of events in a single server, single arrival queue system?

1. Arrivals - time between sucessive arrivals is Exp(λ) 2. Service starts - happens when there is one or more in the queue and the server is "idle" 3. Service ends - happens at a random time drawn from Fs, after a service start event - in between start and end the server is "busy"

What is the mathematical model for the arrival times?

Poisson process

If we don't use the Poisson process to model the arrival times what can we use?

Could repalce with any cdf F_A and sample X ∼ F_A insread of X∼Exp(λ)