Probability Flashcards
What is a sample space and how do you write it?
The set of all possible outcomes, eg.
throwing two dice: Ω = {(i, j) : 1 ≤ i, j ≤ 6}
tossing a coin: Ω = {H, T}
What is a subset of Ω (sample space) called?
An event
When are two events disjoint?
A ∩ B = ∅
When they cannot both occur
What is Stirling’s formula for the approximation of n!?
n! ∼()√2πn^(n+ 1/2)e^(−n)
What is the formula for the number of the arrangements of n objects, with repeats?
Eg,
a₁, …, a₁, a₂,…, a₂,…aₖ, …, aₖ
where a₁ is repeated m₁ times etc.
n!/(m₁!m₂!…m!)
What is the multinomial coefficient?
The coefficient of a₁ᵐ¹….aₖᵐᵏ
in (a₁ + … + aₖ)^n where m1 + … + mk = n
nC(m₁m₂…mₖ)
- How many distinct non-negative integer-valued solutions of the equation
x₁ + x₂ + · · · + xₘ = n
are there?
(n+m-1)Cn
What is Vandermonde’s identity?
For k, m, n ≥ 0
(m+n)Ck = ᵏΣⱼ₌₀(mCj)(nC(k-j))
mCj = 0 for j>m
Prove Vandermonde’s identity
Suppose we choose a committee consisting of k people from a group of m men and n women.
There are (m+n)Ck
ways of doing this which is the left-hand side.
Now the number of men in the committee is some j ∈ {0, 1, . . . , k} and then it contains k − j women.
The number of ways of choosing the j men is mCj
and for each such choice there are nC(k-j)
choices for
the women who make up the rest of the committee. So there are mCj * nC(k-j)
committees with exactly j
men and summing over j we get that the total number of committees is given by the right-hand side
A probability space is a triple (Ω, F, P)
(Fancy F and P).
What do these symbols mean?
- Ω is the sample space
- F is a collection of subsets of Ω, called events, satisfying axioms F1–F3
- P is a probability measure, which is a function P : F → [0, 1] satisfying axioms P1–P3
What is the probability of the union of two disjoint events?
eg, P(A ∪ B)
P(A ∪ B) = P (A) + P (B)
What are the axioms on F (a collection of subsets of Ω)?
F1: ∅ ∈ F.
F2: If A ∈ F, then also Aᶜ ∈ F.
F3: If {Ai, i ∈ I} is a finite or countably infinite collection of members of F, then ∪ᵢ∈ᵢ Aᵢ ∈ F
What are the axioms of P, where P is a function from F to R?
P1: For all A ∈ F, P(A) ≥ 0. P2: P(Ω) = 1 P3: If {Ai, i ∈ I} is a finite or countably infinite collection of members of F, and Ai ∩ Aj = ∅ for i ≠ j, then P(∪ᵢ∈ᵢ Aᵢ) = Σi∈I P(Ai)
When Ω is finite or countably infinite, what do we usually take F to be?
We normally
take F to be the set of all subsets of Ω (the power set of Ω)
Suppose that (Ω, F, P) is a probability space and that A, B ∈ F If A ⊆ B then P (A) ≤
A ⊆ B then P (A) ≤ P (B)
Prove that P (A’) = 1 − P (A) using the probability axioms
Since A ∪ A’ = Ω and A ∩ A’ = ∅, by P3, P (Ω) = P (A) + P (A’). By P2, P (Ω) = 1 and so P(A) + P (A’) = 1, which entails the required result
Prove A ⊆ B then P (A) ≤ P (B) using the probability axioms
Since A ⊆ B, we have B = A ∪ (B ∩ A’). Since B ∩ Ac ⊆ A’, it must be disjoint from A. So by P3, P(B) = P(A) + P(B ∩ A’). Since by P1, P(B ∩ A’) ≥ 0, we thus have P (B) ≥ P(A)`
Conditional Probability
What is the probability of A given B?
P(A|B) = P(A ∩ B)/P(B)
Let (Ω, F, P) be a probability space and let B ∈ F satisfy P(B) > 0. Define a new function Q : F → R by Q(A) = P(A|B)
Is (Ω, F, Q) a probability space?
Prove your result
Yes
Proof pg 12
When are events A and B independent?
Events A and B are independent if P(A ∩ B) = P(A)P(B)
More generally, a family of events A = {Aᵢ : i ∈ I} is independent if…
P(∩ᵢ∈ⱼ Aᵢ) = Πᵢ∈ⱼ P(Aᵢ)
for all finite subsets J of I
When is a family of events pairwise independent?
A family A of events is pairwise independent if P(Aᵢ ∩ Aⱼ ) = P(Aᵢ)P(Aⱼ ) whenever i ≠ j.
Does Pairwise Independence imply independence?
NO!!!!
Given A and B are independent, are A and B’, and A’ and B’ independent?
Both A and B' and A' and B' are independent
Prove that A and B’ are independent given A and B are independent
We have A = (A ∩ B) ∪ (A ∩ B’), where A ∩ B and A ∩ B’ are disjoint, so using the
independence of A and B, P(A ∩ B’) = P (A) − P(A ∩ B) = P(A) − P(A) P(B) = P (A) (1 − P(B)) = P(A)P(B’)
When is a family of events {B1, B2, . . .} a partition of Ω?
if
- Ω = ∪ᵢ≥₁ Bᵢ (so that at least one Bi must happen), and
- Bᵢ ∩ Bⱼ = ∅ whenever i ≠ j (so that no two can happen together)
What is the law of total probability/partition theorem?
Suppose {B1, B2, . . .} is a partition of Ω by sets from F,
such that P (Bᵢ) > 0 for all i ≥ 1. Then for any A ∈ F
P(A) = ᵢ≥₁ΣP(A|Bᵢ)P(Bᵢ)
Prove the partition theorem
P(A) = P(A ∩ (∪ᵢ≥₁Bᵢ)), since ∪ᵢ≥₁Bᵢ = Ω
= P(∪ᵢ≥₁(A ∩ Bᵢ))
= ᵢ≥₁Σ P (A ∩ Bᵢ) by axiom P3, since A ∩ Bᵢ, i ≥ 1 are disjoint
= ᵢ≥₁Σ P (A|Bᵢ)P(Bᵢ)
What is Bayes’ Theorem?
Suppose that {B1, B2, . . .} is a partition of Ω by sets from F such that P (Bi) > 0 for all i ≥ 1. Then for any A ∈ F such that P (A) > 0
P(Bₖ|A) = P(A|Bₖ)P(Bₖ)/(ᵢ≥₁Σ P (A|Bᵢ)P(Bᵢ))
Prove Bayes’ theory
We have P(Bₖ|A) = P(Bₖ ∩ A)/P(A)
= P(A|Bₖ)P(Bₖ)/P(A)
Now substitute for P(A) using the law of total probability
What is Simpson’s paradox?
it consists of the fact that for events E,
F, G, we can have
P(E|F ∩ G) > P(E|F’ ∩ G)
P(E|F ∩ G’) > P(E|F’ ∩ G’)
and yet
P(E|F) < P(E|F’).
What is the multiplication rule?
Eg, P(A ∩ B) = …
P(A ∩ B) = P(A|B) P(B) = P(B|A) P(A)
What is the generalisation of the multiplication rule for n events?
P (A1 ∩ A2 ∩ . . . ∩ An) = = P(A1) P(A2|A1). . . P(An|A1 ∩ A2 ∩ . . . ∩ An−1)
inclusion-exclusion formula
P (A1 ∪ A2 ∪ . . . ∪ An) = ⁿΣᵢ₌₁ P(Aᵢ) - ….
P (A1 ∪ A2 ∪ . . . ∪ An) = ⁿΣᵢ₌₁ P(Aᵢ) - Σᵢ>ⱼ P(Ai ∩ Aj) + … + (-1)ⁿ⁺¹P(A1 ∩ A2 ∩ . . . ∩ An)
What is a discrete random variable?
A discrete random variable X on a probability space (Ω, F, P) is a function X : Ω → R such that (a) {ω ∈ Ω : X(ω) = x} ∈ F for each x ∈ R, (b) ImX := {X(ω) : ω ∈ Ω} is a finite or countable subset of R
What is the more common/shorter way of writing P({ω ∈ Ω : X(ω) = x})?
P(X = x)
How is the probability mass function defined?
The probability mass function (p.m.f.) of X is the function pₓ : R → [0, 1] defined by
pₓ(x) = P(X = x)
What is the pmf when x ≠ ImX?
If x ≠ ImX X (that is, X(ω) never equals x) then pₓ(x) = P ({ω : X(ω) = x}) = P (∅) = 0.
What does Σₓ∈ᵢₘₓ pₓ(x) = ?
why?
ₓ∈ᵢₘₓΣ pₓ(x) = ₓ∈ᵢₘₓΣ P ({ω : X(ω) = x})
=P(ₓ∈ᵢₘₓ ∪ {ω : X(ω) = x}) since the events are disjoint
= P (Ω) since every ω ∈ Ω gets mapped somewhere in ImX
= 1
X has the Bernoulli distribution with parameter p (where 0 ≤ p ≤ 1) if…
P(X = 0) = 1 − p, P(X = 1) = p
X has a binomial distribution with parameters n and p (where n
is a positive integer and p ∈ [0, 1]) if…
P (X = k) = nCk p^k (1-p)^n-k
If X has the Bernoulli distribution, how do we write this?
X ∼ Ber(p)
If X has the binomial distribution, how do we write this?
X ∼ Bin(n, p)
If X has the geometric distribution, how do we write this?
X ∼ Geom(p)
If X has the Poisson distribution, how do we write this?
X ∼ Po(λ)
X has a geometric distribution with parameter p if….
P(X = k) = p(1 − p)^k-1, k = 1, 2, ....
What can the geometric distribution model?
We can use X to model the number of independent trials needed until we see the first success,
where p is the probability of success on a single trial
If you want to use the geometric distribution to model he number of failures before the first success, which formula do you use?
P (Y = k) = p(1 − p)^k,
k = 0, 1, …
X has the Poisson distribution with parameter λ ≥ 0 if…
P (X = k) = ( λ^k e^-λ) /k!, k = 0, 1, …
Define the expectation of X
The expectation (or expected value or mean) of X is E[X] = ₓ∈ᵢₘₓΣ xP(X=x) provided that ₓ∈ᵢₘₓΣ |x|P(X=x) < ∞
What is the expectation of the Poisson distribution?
λ
What is the expectation of the Geometric distribution?
1/p
What is the expectation of the Binomial distribution?
np
What is the expectation of the Bernoulli distribution?
p
Let h : R → R
If X is a discrete random variable, is Y = h(X) also a discrete random variable?
Yes
If h : R → R, then
E [h(X)] = ….
E [h(X)] = ₓ∈ᵢₘₓΣ h(x)P (X = x)
provided that ₓ∈ᵢₘₓΣ |h(x)|P (X = x) < ∞.
Prove the theorem that
E [h(X)] = ₓ∈ᵢₘₓΣ h(x)P (X = x)
Let A = {y : y = h(x) for some x ∈ ImX}
Start from the rhs. Write it as two sums, one over y∈A, the other over x∈ImX:h(x)=y
pg22
Take h(x) = x^k What is E[X^k] called?
The kth moment of X, when it exists
Let X be a discrete random variable such that E [X] exists.
Describe the expectation when X is non-negative
Prove it
If X is non-negative then E [X] ≥ 0
We have ImX ⊆ [0, ∞) and so
E [X] = ₓ∈ᵢₘₓΣ xP (X = x) is a sum whose terms are all non-negative and so must itself be non-negative.
Let X be a discrete random variable such that E [X] exists.
If a, b ∈ R then E [aX + b] = …
Prove it
E [aX + b] = aE [X] + b
For a discrete random variable X, define the variance
For a discrete random variable X, the variance of X is defined by var (X) = E[(X − E[X])² ] = E[X²] - (E[X])² provided that this quantity exists.
What is variance a measure of?
The variance is a measure of how much the distribution of X is spread out about its mean: the more
the distribution is spread out, the larger the variance.
Is always Var(X)≥ 0? Why?
Yes
since (X−E [X])2
is a non-negative random variable, var (X) ≥ 0
How are standard deviation and variance related?
Standard deviation^2 = var (X)
Suppose that X is a discrete random variable whose variance exists. Then if a and b
are (finite) fixed real numbers, then the variance of the discrete random variable Y = aX + b is given by ….
Prove it
var (Y ) = var (aX + b) = a² var (X)
Suppose that B is an event such that P (B) > 0. Then the conditional distribution of
X given B is…
P(X = x|B) =
P(X = x|B) = P({X = x} ∩ B) / P(B), for x ∈ R
Suppose that B is an event such that P (B) > 0,
The conditional expectation of X given B is…
ₓΣxP(X = x|B),
whenever the sum converges absolutely
We write pₓ|ᵦ(x) = P(X=x|B)
What is the Partition theorem for expectations?
If {B1, B2, . . .} is a partition of Ω such that
P (Bi) > 0 for all i ≥ 1 then
E [X] = ᵢ≥₁ΣE [X | Bᵢ] P(Bᵢ),
whenever E [X] exists.
Prove the Partition theorem for expectations
Use the total law of probability to split into two sums, one over x, one over i.
pg24
Given two random variables X and Y their joint distribution (or joint probability
mass function) is
pₓ,ᵧ (x, y) =
pₓ,ᵧ (x, y) = P ({X = x} ∩ {Y = y})
= P(X = x, Y = y)
x, y ∈ R
Is pₓ,ᵧ (x, y) always greater than 0?
Yes
What does ₓΣᵧΣpₓ,ᵧ (x, y) = ??
ₓΣᵧΣpₓ,ᵧ (x, y) = 1
Joint distributions:
What is the marginal distribution of X?
pₓ(x) = ᵧΣpₓ,ᵧ (x, y)
Joint distributions:
marginal distribution of Y?
pᵧ(y) = ₓΣpₓ,ᵧ (x, y)
Whenever pX(x) > 0 for some x ∈ R, we can also write down the conditional distribution of Y given that X = x: pᵧ|ₓ₌ₓ(y) =
pᵧ|ₓ₌ₓ(y) = P (Y = y|X = x)
= pₓ,ᵧ(x,y)/pₓ(x) for y ∈ R
The conditional expectation of Y given that X = x is
E [Y |X = x] = …
E [Y |X = x] = ᵧΣypᵧ|ₓ₌ₓ(y)
whenever the sum converges absolutely
When are Discrete random variables X and Y independent?
P(X = x, Y = y) = P(X = x)P(Y = y) for all x, y ∈ R.
In other words, X and Y are independent if and only if the events {X = x} and {Y = y} are independent
for all choices of x and y. We can also write this as
pₓ,ᵧ (x, y) = pₓ(x)pᵧ(y) for all x, y ∈ R
In the same way as we defined expectation for a single discrete random variable, so in the bivariate case
we can define expectation of any function of the random variables X and Y . Let h : R² → R. Then
h(X, Y ) is itself a random variable, and
E[h(X, Y )] =
E[h(X, Y )] = ₓΣᵧΣ h(x, y)P(X = x, Y = y)
= ₓΣᵧΣ h(x, y)pₓ,ᵧ (x, y)
provided the sum converges absolutely.
Suppose X and Y are discrete random variables and a, b ∈ R are constants. Then
E[aX + bY ] =
Prove it
E[aX + bY ] = aE[X] + bE[Y ]
provided that both E [X] and E [Y ] exist.
Prove it pg28
What does E[aX + bY ] = aE[X] + bE[Y ] about expectation?
expectation is linear
E[a₁X₁ + · · · + aₙXₙ] =
E[a₁X₁ + · · · + aₙXₙ] = a₁E[X₁] + · · · + aₙE[Xₙ]
If X and Y are independent discrete random variables whose expectations exist, then
E[XY ] =
Prove it
E[XY] = E[X]E[Y ]
Proof pg28
What is the covariance of X and Y?
cov (X, Y ) = E[(X − E [X])(Y − E [Y ])]
What is cov(X,X) = ?
cov (X, X) = var (X)
Does cov (X, Y ) = 0 imply that X and Y are independent?
NO!!!!
multivariate distributions:
pX₁,X₂,…,Xₙ
(x₁, x₂, . . . , xₙ) =
pX₁,X₂,…,Xₙ
(x₁, x₂, . . . , xₙ) = P(X₁ = x₁, X₂ = x₂, …, Xₙ = xₙ)
for x₁, x₂, …,xₙ ∈ R
A family {Xᵢ
: i ∈ I} of discrete random variables are independent if ….
A family {Xᵢ : i ∈ I} of discrete random variables are independent if for all finite
sets J ⊆ I and all collections {Aᵢ : i ∈ J} of subsets of R,
P(ᵢ∈ⱼ∩{Xᵢ ∈ Aᵢ}) = ᵢ∈ⱼΠP(Xᵢ ∈ Aᵢ)
Suppose that X1, X2, . . . are independent random variables which all have the same distribution, what do we call them?
Independent and identically distributed (i.i.d)
A kth order linear recurrence relation (or difference equation) has the form….
ᵏΣⱼ₌₀ aⱼ uₙ₊ⱼ = f(n)
with a₀ ≠ 0 and aₖ ≠ 0, where a₀…aₖ re constants independent of n
A solution to such a difference
equation is a sequence (uₙ)ₙ ≥ ₀ satisfying the sum for all n ≥ 0.
The general solution (uₙ)ₙ ≥ ₀ (i.e. if the boundary conditions are not specified) of ᵏΣⱼ₌₀ aⱼ uₙ₊ⱼ = f(n) can be written as …
Prove this
uₙ = vₙ +wₙ where (vₙ)ₙ ≥ ₀ is a particular solution to the equation and (wₙ)ₙ ≥ ₀ solves
the homogeneous equation ᵏΣⱼ₌₀ aⱼ wₙ₊ⱼ = 0
proof pg31
How would you solve the second order linear difference equation:
uₙ₊₁ + auₙ + buₙ₋₁ = f(n) ?
Substitute wₙ = Aλⁿ in wₙ₊₁ + awₙ + bwₙ₋₁ = 0
then divide by Aλⁿ⁻¹ to get the quadratic: λ² + aλ + b = 0 (Aux Eqn)
General Soln = wₙ = A₁λ₁ⁿ + A₂λ₂ⁿ
or if λ₁ = λ₂ = λ then wₙ = (A + Bn)λⁿ
Consider a random walk on the integers Z, started from some n > 0,
which at each step increases by 1 with probability p, and decreases by 1 with probability q = 1 − p. Then
the probability uₙ that the walk ever hits 0 is given by…..
Prove it
uₙ = { (q/p)ⁿ if p>q
1 if p ≤ q
Proof pg 38
Let X be a non-negative integer-valued random variable. Let
S := { s ∈ R : ∞Σₖ₌₀ |s|ᵏ P(X = k) < ∞ }
Then the probability generating function (p.g.f.) of X is Gₓ : S → R defined by ….
Gₓ(s) = E[sˣ] = ∞Σₖ₌₀ sᵏP(X=k)
pₓ(k) = pₖ = …
pₓ(k) = pₖ = P(X=k)
Is the distribution of X uniquely determined by its probability generating function, Gₓ?
Yes
What is the probability generating function of the Bernoulli distribution?
Gₓ(s) = ₖΣpₖsᵏ = qs⁰ + ps¹ = q + ps
for all s ∈ R
What is the probability generating function of the Binomial distribution?
Gₓ(s) = ⁿΣₖ₌₀ sᵏ ⁿCₖ pᵏ (1-p)ⁿ⁻ᵏ = ⁿΣₖ₌₀ ⁿCₖ (ps)ᵏ (1-p)ⁿ⁻ᵏ = (1 - p + ps)ⁿ
by the binomial theorem. This is valid for all s ∈ R
What is the probability generating function of the Poisson distribution?
Gₓ(s) = ∞Σₖ₌₀ sᵏ λᵏe^-λ/k! = e^-λ ∞Σₖ₌₀ (sλ)ᵏ/k! = e^λ(s-1)
for all s ∈ R
What is the probability generating function of the Geometric distribution with parameter p?
Gₓ(s) = ps/(1-(1-p)s)
provided that |s| < 1/1−p
If X and Y are independent, then Gₓ₊ᵧ(s) = …
Gₓ₊ᵧ(s) = Gₓ(s)Gᵧ(s)
Prove that Gₓ₊ᵧ(s) = Gₓ(s)Gᵧ(s) if X and Y are independent
Gₓ₊ᵧ(s) = E[sˣ⁺ʸ] = E[sˣsʸ]
Since X and Y are independent, sˣ and sʸ are independent.
So this equals E[sˣ]E[sʸ] = Gₓ(s)Gᵧ(s)
Suppose that X₁, X₂, …, Xₙ are independent Ber(p) random variables and let Y = X₁ + … + Xₙ. How is Y distributed?
Y ∼ Bin(n, p)
Prove that Y ∼ Bin(n, p), if Y = X₁ + … + Xₙ and X₁, X₂, …, Xₙ are independent Ber(p) random variables
Gᵧ(s) = E[sʸ] = E[s^(X₁ + … + Xₙ)] = E[s^X₁] … E[s^Xₙ] = (1 - p + ps)ⁿ
As Y has the same p.g.f. as a Bin(n, p) random variable, we deduce that Y ∼ Bin(n, p).
Suppose that X₁, X₂, …, Xₙ are independent random variables such that Xᵢ ∼ Po(λᵢ)
Then ⁿΣᵢ₌₁ Xᵢ ∼ ….
In particular, what happens when λᵢ = λ for all 1 ≤ i ≤ n
Prove all of this
ⁿΣᵢ₌₁ Xᵢ ∼ Po(ⁿΣᵢ₌₁ λᵢ)
λᵢ = λ for all 1 ≤ i ≤ n:
ⁿΣᵢ₌₁ Xᵢ ∼ Po(nλ)
Proof pg41
Show that G’ₓ(1) = E[X]
G'ₓ(s) = d/ds E[sˣ] = d/ds ∞Σₖ₌₀ sᵏ P(X=k) = ∞Σₖ₌₀ d/ds sᵏ P(X=k) = ∞Σₖ₌₀ ksᵏ⁻¹P(X=k) = E[Xsˣ⁻¹] G'ₓ(1) = E[X]
G’‘ₓ(1) = …
G’‘ₓ(1) = E[X(X − 1)] = E[X²] − E[X],
Write the variance of X in terms of Gₓ(1) and its derivatives
var(X) = G’‘ₓ(1) + G’ₓ(1) - (G’ₓ(1))²
dᵏ/dsᵏ Gₓ(s) |ₛ₌₁ = …
dᵏ/dsᵏ Gₓ(s) |ₛ₌₁ = E[X(X-1) … (X - k + 1)]
Let X₁, X₂, . . . be i.i.d. non-negative integer-valued random variables with p.g.f. Gₓ(s).
Let N be another non-negative integer-valued random variable, independent of X₁, X₂, . . . and with p.g.f.
Gₙ(s). Then the p.g.f. of ᵢ₌₁Σᴺ Xᵢ is ……
Prove it
The pgf of ᵢ₌₁Σᴺ Xᵢ is Gₙ(Gₓ(s))
Note that the sum ᵢ₌₁Σᴺ Xᵢ has a random number of terms. We interpret it as 0 if N = 0.
Proof pg 44
Suppose that X₁, X₂, … are independent and identically distributed Ber(p) random variables and that N ∼ Po(λ), independently of X₁, X₂, … Then ᵢ₌₁Σᴺ Xᵢ ∼
ᵢ₌₁Σᴺ Xᵢ ∼ Po(λp)
Prove that:
Suppose that X₁, X₂, … are independent and identically distributed Ber(p) random variables and that N ∼ Po(λ), independently of X₁, X₂, … Then ᵢ₌₁Σᴺ Xᵢ ∼ Po(λp)
Gₓ(s) = 1 - p + ps and Gₙ(s) = exp(λ(s − 1)) and so
E[s^( ᵢ₌₁Σᴺ Xᵢ)] = Gₙ(Gₓ(s)) = exp(λ(1 - p + ps - 1)) = exp(λp(s-1))
Since this is the p.g.f. of Po(λp) and p.g.f.’s uniquely determine distributions, the result follows
What is the offspring distribution?
Suppose we have a population (say of bacteria). Each individual in the population lives a unit time and,
just before dying, gives birth to a random number of children in the next generation. This number of
children has probability mass function p(i), i ≥ 0, called the offspring distribution
Let Xₙ be the size of the population in generation n, so that X₀ = 1. Let Cᵢ⁽ⁿ⁾ be the number of children
of the ith individual in generation n ≥ 0, so that we may write Xₙ₊₁ = …
Xₙ₊₁ = C₁⁽ⁿ⁾ + C₂⁽ⁿ⁾ + … + Cₓₙ⁽ⁿ⁾
We interpret this sum as 0 if Xₙ = 0
Note that C₁⁽ⁿ⁾, C₂⁽ⁿ⁾, …. are independent and identically distributed.
Let Xₙ be the size of the population in generation n, so that X₀ = 1. Let Cᵢ⁽ⁿ⁾ be the number of children
of the ith individual in generation n ≥ 0, so that we may write Xₙ₊₁ = C₁⁽ⁿ⁾ + C₂⁽ⁿ⁾ + … + Cₓₙ⁽ⁿ⁾
What is G(s)? and Gₙ(s)
G(s) = ∞Σᵢ₌₀ p(i)sᶦ Gₙ(s) = E[sˣⁿ] (That's X subscript n)
For n ≥ 0
Gₙ₊₁(s) = …
Prove it
Gₙ₊₁(s) = Gₙ(G(s)) = G(G(…G(s)…)) = G(Gₙ(s))
^(n+1) times
Proof pg 45
Suppose that the mean number of children of a single individual is µ i.e. ∞Σᵢ₌₁ ip(i) = µ
E[Xₙ] = ….
Prove it
E[Xₙ] = µⁿ
Proof pg 46
Branching processes, what is the probability that the population dies out?
P(population dies out) = P(∞∪ₙ₌₀ {Xₙ = 0}) ≥ P (X₁ = 0) = p(0) > 0
Extinction Probability (non-examinable) pg 47-48
Extinction Probability (non-examinable)
A random variable X defined on a probability space (Ω, F, P) is a function X: [ ] such that { w: [ ]} ∈ F for each x ∈ R.
A random variable X defined on a probability space (Ω, F, P) is a function X : Ω → R
such that {ω : X(ω) ≤ x} ∈ F for each x ∈ R.
What is the cumulative distribution function of a random variable X?
is the function
Fₓ : R → [0, 1] defined by
Fₓ(x) = P (X ≤ x)
Continuous distributions
The cdf = Fₓ(x)
Is Fₓ decreasing?
Prove
No, it’s non-decreasing
Proof pg 51
Continuous distributions
The cdf = Fₓ(x)
P (a < X ≤ b) = ???
Prove
P (a < X ≤ b) = Fₓ(b) − Fₓ(a) for a < b
Proof pg 51
Continuous distributions
The cdf = Fₓ(x)
As x → −∞, Fₓ(x) → ???
Prove
x → −∞, Fₓ(x) → 0
Proof pg 51/52
Continuous distributions
The cdf = Fₓ(x)
As x → ∞, Fₓ(x) → ???
Prove
x → ∞, Fₓ(x) → 1
Proof pg x → ∞, Fₓ(x) → 151/52
Continuous distributions Any functions satisfying: Fₓ is non-decreasing P (a < X ≤ b) = Fₓ(b) − Fₓ(a) for a < b x → −∞, Fₓ(x) → 0 x → −∞, Fₓ(x) → 0 and [ ] is the cumulative distribution function of some random variable defined on some probability space
Right Continuity
A continuous random variable X is a random variable whose c.d.f. satisfies Fₓ(x) = P[ ] = ∫ [ ] where fₓ : R → R is a function such that a) fₓ(u) [ ] 0 for all u ∈ R b) −∞ ∫ ∞ fₓ(u) du =
Fₓ(x) = P (X ≤ x) = −∞ ∫ˣ fₓ(u) du
Bounds on the integral -∞ → x
where fₓ : R → R is a function such that
a) fₓ(u) ≥ 0 for all u ∈ R
b) −∞ ∫ ∞ fₓ(u) du = 1
Continuous distributions
What is fₓ called?
fₓ is called the probability density function (p.d.f.) of X or, sometimes, just its density.
The Fundamental Theorem of Calculus tells us that Fₓ of the form given in the definition is differentiable with dFₓ(x)/dx = [ ]
dFₓ(x)/dx = fₓ(x)
at any point x such that fₓ(x) is continuous.
Is fₓ(x) a probability??
No!!!!!
Therefore it can exceed 1
If X is a continuous random variable with p.d.f fₓ then
P(X=x) = [ ]
P(a ≤ X ≤ b) = [ ]
P(X=x) = 0 for all x ∈ R
P(a ≤ X ≤ b) = ₐ∫ᵇ fₓ(x) dx
What is the p.d.f. of the Uniform distribution?
fₓ(x) = {1/b-a for a ≤ x ≤ b,
{ 0 otherwise
What’s the notation for X is distributed Uniformally?
X ∼ U[a, b]
What is the p.d.f. of the exponential distribution?
fₓ(x) = λe^(-λx), x ≥ 0
What is the p.d.f. of the gamma distribution?
α > 0 and λ ≥ 0
fₓ(x) = ((λ^α)/Γ(α)) x^(α-1)e^(-λx), x ≥ 0
Here, Γ(α) is the so-called gamma function, which is defined by
Γ(α) = ∞∫₀ u^(α-1)e⁻ᵘ du for α > 0
For most values of α this integral does not have a closed form. However, for a strictly
positive integer n, we have Γ(n) = (n − 1)!.
What is the p.d.f. of the e normal (or Gaussian) distribution?
µ ∈ R
and σ²> 0
fₓ(x) = 1/√2πσ² exp(-(x − µ)²/2σ² ), x ∈ R
What’s the notation for when X is gamma distributed?
X ∼ Gamma(α, λ)
What’s the notation for X is distributed normally?
X ∼ N(µ, σ²)
What’s the notation for X is distributed normally?
X ∼ N(µ, σ²)
What is the standard normal distribution?
N(0, 1)
P (x ≤ X ≤ x + δ) ≈ [ ]
P (x ≤ X ≤ x + δ) ≈ fₓ(x) δ
P (nδ ≤ X ≤ (n + 1)δ) ≈ [ ]
P (nδ ≤ X ≤ (n + 1)δ) ≈ fₓ(nδ)δ
Let X be a continuous random variable with probability density function fₓ.
The expectation or mean of X is defined to be …
E [X] = −∞ ∫ ∞ xfₓ(x) dx
whenever −∞ ∫ ∞ |x|fₓ(x) dx < ∞
Let X be a continuous random variable with probability density function fₓ
and let h be a function from R to R. Then
E [h(X)] = ???
E [h(X)] = −∞ ∫ ∞ h(x)fₓ(x) dx
whenever −∞ ∫ ∞ |h(x)|fₓ(x) dx < ∞
Suppose X is a continuous random variable with p.d.f. fₓ.
Then if a, b ∈ R then
E [aX + b] = ???
and var (aX + b)
Prove it
E [aX + b] = aE [X] + b
var (aX + b) = a²var (X)
Proof pg 58
Does E[1/X] = 1/E[X]?
No!!!!
Suppose that X is a continuous random variable with density fₓ and that h : R → R
is a differentiable function which is strictly increasing.
Then Y = h(X) is a
continuous random variable with p.d.f.
fᵧ(y) =
Prove
fᵧ(y) = fₓ(h⁻¹(y))d/dy h⁻¹(y)
where h⁻¹ is the inverse function of h
Proof pg60
joint cumulative distribution function, Fₓ,ᵧ : R
2 → [0, 1],
given by
Fₓ,ᵧ (x, y) =
Fₓ,ᵧ (x, y) = P (X ≤ x, Y ≤ y)
joint cumulative distribution
Is Fₓ,ᵧ non-decreasing?
Yes
joint cumulative distribution
What does Fₓ,ᵧ = when a and y →∞
Fₓ,ᵧ(x, y) = 1
joint cumulative distribution
What does Fₓ,ᵧ = when x and y → - ∞
Fₓ,ᵧ(x, y) = 0
Let X and Y be random variables such that
Fₓ,ᵧ(x, y) = −∞∫ʸ −∞∫ˣ fₓ,ᵧ(u, v) dudv
for some function fₓ,ᵧ : R² → R such that
a) fₓ,ᵧ(u, v) [ ] 0 for all u, v ∈ R
b) −∞∫ʸ −∞∫ˣ fₓ,ᵧ(u, v) dudv = [ ]
a) fₓ,ᵧ(u, v) ≥ 0 for all u, v ∈ R
b) −∞∫ʸ −∞∫ˣ fₓ,ᵧ(u, v) dudv = 1
If X and Y are jointly continuous, what is fₓ,ᵧ ??
their joint density function.
What is fₓ,ᵧ in terms of Fₓ,ᵧ(x,y)?
fₓ,ᵧ(x, y) = ∂²/∂x∂y Fₓ,ᵧ(x,y)
For a single continuous random variable X, it turns out that the probability that it lies in some nice set
A ∈ R can be obtained by integrating its density over A
P (X ∈ A) = ???
P (X ∈ A) = ₐ∫ fₓ(x) dx
For a single continuous random variable X
for nice sets B ⊆ R² we obtain the probability that the pair (X, Y ) lies in B by integrating
the joint density over the set B
P ((X, Y ) ∈ B) = ??
P ((X, Y ) ∈ B) = ∫∫₍ₓ,ᵧ₎∈ᵦ fₓ,ᵧ(x, y)) dxdy
For a pair of jointly continuous random variables X and Y , we have
P (a < X ≤ b, c < Y ≤ d) = …
Prove
P (a < X ≤ b, c < Y ≤ d) = 𝒸∫ᵈ ₐ∫ᵇ fₓ,ᵧ(x, y)) dxdy
for a < b and c < d
Proof pg62
Suppose X and Y are jointly continuous with joint density fₓ,ᵧ. Then X is a continuous random variable with density
fₓ(x) =
-∞∫∞ fₓ,ᵧ(x, y)) dy
Suppose X and Y are jointly continuous with joint density fₓ,ᵧ. Then Y is a continuous random variable with density
fᵧ(y) =
Prove
-∞∫∞ fₓ,ᵧ(x, y)) dx
Proof pg 63
the one-dimensional densities fₓ and fᵧ of the joint
distribution with density fₓ,ᵧ, are called what?
Marginal distribution
When are Jointly continuous random variables X and Y with joint density fₓ,ᵧ independent?
fₓ,ᵧ(x, y) = fₓ(x) fᵧ(y)
for all x, y ∈ R
jointly continuous random variables X₁, X₂, . . . , Xₙ with joint density
fₓ₁,ₓ₂,…,ₓₙ are independent if…
fₓ₁,ₓ₂,…,ₓₙ(x₁, x₂, . . . , xₙ) = fₓ₁(x₁)fₓ₂(x₂) … fₓₙ(xₙ)
for all x₁, x₂, . . . , xₙ∈ R
if X and Y are independent then it follows easily that Fₓ,ᵧ (x, y) = …
Fₓ,ᵧ (x, y) = Fₓ(x)Fᵧ(y)
for all x, y ∈ R.
Write E [h(X, Y )] in terms of a double integral
E [h(X, Y )] = -∞∫∞ -∞∫∞ h(x, y) fₓ,ᵧ(x, y)) dxdy
What is he cov(X, Y)?
cov (X, Y ) = E [(X − E [X])(Y − E [Y ])] = E [XY ] − E [X] E [Y ]
Let X₁, X₂, . . . , Xₙ denote i.i.d. random variables. Then these random variables are
said to constitute a [ ] from the distribution
random sample of size n
What is the sample mean defined to be?
_
Xₙ = 1/n ᵢ₌₁Σⁿ Xᵢ
What is var(X+Y)?? For random variables X and Y
var (X + Y ) = var (X) + var (Y ) + 2cov (X, Y )
What is var(ᵢ₌₁Σⁿ Xᵢ)?? For random variables X and Y
var(ᵢ₌₁Σⁿ Xᵢ) = ᵢ₌₁Σⁿ var(Xᵢ) + ᵢ≠ⱼΣcov(Xᵢ, Xⱼ)
= ᵢ₌₁Σⁿ var(Xᵢ) +2ᵢ
Suppose that X₁, X₂, . . . , Xₙ form a random sample from a distribution with mean µ
and variance σ². Then the expectation and variance of the sample mean are …
Prove it
_ _
E[Xₙ] = µ and var(Xₙ) = 1/n σ²
Proof pg 67
Let X₁, X₂, . . . , Xₙ be a random sample from a Bernoulli distribution with parameter p.
What do E[Xᵢ], var(Xᵢ), and _ _
E[Xₙ] var(Xₙ)
equal??
E[Xᵢ] = p var(Xᵢ) = p(1-p) for all 1 ≤ i ≤ n Hence, _ _ E[Xₙ] = p and var(Xₙ) = p(1-p)/n
Suppose that A is an event with probability P (A) and write p = P (A). Let X be the indicator function
of the event A i.e. the random variable defined by
X(ω) = 1ₐ(ω) = {1 if ω ∈ A
{0 if ω ∉ A
Then X ∼ [ ] and E[X} = [ ]
X ∼ Ber(p) and E [X] = p
State the weak law of large numbers ….
Prove it
Suppose that X₁, X₂, . . . . are independent and identically
distributed random variables with mean µ. Then for any fixed ε > 0
As n → ∞
P(|1/n ᵢ₌₁Σⁿ Xᵢ − µ| > ε)→0
Proof pg 68
Weak law of large numbers:
P(|1/n ᵢ₌₁Σⁿ Xᵢ − µ| ≤ ε)→???
As n → ∞
P(|1/n ᵢ₌₁Σⁿ Xᵢ − µ| ≤ ε)→1
What is Markov’s inequality?
Prove it
Suppose that Y is a non-negative random variable whose expectation exists. Then
P(Y ≥ t) ≤ E[Y]/t for all t > 0.
Proof pg68
What is Chebyshev’s inequality?
Prove it
Suppose that Z is a random variable with a finite variance. Then for any t > 0,
P (|Z − E [Z] | ≥ t) ≤
var (Z)/t²
Proof: Note that P (|Z − E [Z]| ≥ t) = P((Z − E [Z])² ≥ t²)
and then apply Markov’s inequality to the
non-negative random variable Y = (Z − E [Z])²