Stats Flashcards

1
Q

What is a UNIFORM Probability Space?

A

A uniform probability space is defined as the triplet (Ω, F, P),
where
* Ω (the sample space) is a non-empty finite set of all possible outcomes of the experiment;
* F (the event space) is the collection of all events, given by the power-set P(Ω) of Ω;
* P : F → [0, 1] is a map from the event space to [0, 1], satisfying:
− P(∅) = 0, P(Ω) = 1
− P(A ∪ B) = P(A) + P(B), for every A, B ∈ F such that A ∩ B = ∅
(finite additivity)
− P({ω}) = P({ω˜}) for all ω, ω˜ ∈ Ω (uniform)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we calculate probability in a uniform probability space?

A

Let (Ω, F, P) be a uniform probability space. Then for all ω ∈ Ω,
P({ω}) = 1/|Ω|
and for all A ⊆ Ω (A ∈ F),
P(A) = |A|/|Ω|.

Proof,
Since ∀ ω1, ω2 ∈ Ω, P({ω1}) = P({ω2}), let p ∈ [0, 1] s.t.
p := P({ω}) ∀ ω ∈ Ω .
Since P is a probability measure
1 = P(Ω) = Σω∈Ω P({w}) by (1.1b)
= Σω∈Ω p
= pΣω∈Ω 1
= p|Ω|.
Therefore
p = 1/|Ω|.
showing (1.2). We see that (1.3) follows from (1.1b), as
P(A) = Σω∈A P({ω}), by (1.1b)
= Σω∈A p
= pΣω∈A 1
= p|A| = |A|/|Ω|.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are Sequences of a set?

A

Let A be a finite set of cardinality n ∈ N. A sequence of length
k ∈ N of elements of A is an ordered k-tuple (a1, . . . , ak) s.t. ai ∈ A, i = 1, 2, . . . k.
We denote by Sn,k(A) the set of all sequences of length k of elements of A.

By “ordered” we mean that the order of the sequence matters, for example:
(a1, a2) /= (a2, a1). Note also that repetitions of elements are allowed, for example
(1, 1) is a sequence of length 2 of elements of {1}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Cardinality of Sn,k?

A

Let A be a finite set of cardinality n ∈ N. The set Sn,k(A) of
all sequences of length k ∈ N of elements of A has cardinality n^k, i.e.
|Sn,k(A)| = n^k

Proof,
To construct an arbitrary element (a1, a2, . . . , ak) of Sn,k(A), we perform the following steps
(a) choose the first value a1. There are n1 = |A| = n ways to do this.
(b) choose the second value a2. There are n2 = |A| = n ways to do this.
.
.
.
k) I choose the kth value ak. There are nk = |A| = n ways to do this.
Thus we find
|Sn,k(A)| = n1 · n2 · · · · · nk = n · n · · · · · n = n^k.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an Ordering of a set?

A

Let A be a finite set of cardinality n ∈ N and let k ∈ N such
that k <= n. An ordering of length k of elements of A is a sequence of length k of elements of A with no repetitions. We denote the set of orderings of length k of elements of A by On,k(A). Thus we have
On,k(A) = {(a1, . . . , ak) : ai ∈ A ∀i = 1, . . . , k, ai /= aj ∀ i /= j}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Cardinality of On,k?

A

Let A be a finite set of cardinality n ∈ N and k <= n. Then
|On,k(A)| = n(n − 1). . .(n − k + 1).

Proof. We determine an element of On,k(A) by the following steps:
(a) We choose a1. There are n1 = |A| = n choices for this.
(b) We choose a2 such that a2 /= a1. There are n2 = n − 1 choices for this.
(c) We choose a3 such that a3 /= a2 and a3 /= a1. There are n3 = n − 2 choices
for this.
.
.
.
k) We choose ak such that ak /= ai ∀ i = 1, 2, . . . , k − 1. There are nk = n − (k − 1) choices for this.
By the fundamental counting principle
|On,k(A)| = n1n2 . . . nk = n(n − 1). . .(n − k + 1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Combination of a set?

A

Let A be a finite set of cardinality n ∈ N. A combination of k
elements of A is a subset of A with k elements. We denote by Cn,k(A) the set of combinations of k elements of A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Cardinality of Cn,k?

A

Let A be a finite set of cardinality n ∈ N and k <= n. Then
|Cn,k(A)| = nCk = n!/[k!(n − k)!].

Proof. Notice that an ordering of length k of elements of A can be obtained uniquely by the following steps
(a) Choose a combination of k elements of A. There are n1 = |Cn,k(A)| choices for this.
(b) Choose a permutation of these elements. By Corollary 2.1 there are n2 = k! choices for this.
By the fundamental counting principle,
|On,k(A)| = |Cn,k(A)| · k!.
The above equation can be solved for the only unknown term, giving
|Cn,k(A)| = |On,k(A)|/ k! = n!/(n − k)!k!
= nCk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Permutation and how many permutations of a set are there?

A

Let A be a finite set of cardinality n ∈ N. An ordering of length
n of elements of A is called a permutation of A.

Let A be a finite set of cardinality n ∈ N. Then the number of
permutations of the elements of A is n(n − 1)(n − 2). . . 1 = n!.

Proof, just take k=n as an ordering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Partition of a set?

A

Let A be a set of cardinality n ∈ N and r ∈ N such that r <= n.
A partition of A into r subsets is a family {A1, . . . , Ar} of subsets of A such that
(a) Every subset in the family is non-empty: Ai 6= ∅ ∀ i = 1, 2, . . . , r.
(b) The subsets in the family are pairwise disjoint: Ai ∩ Aj = ∅ ∀ i /= j, i, j = 1, 2, . . . , r.
(c) The union of all the subsets in the family is equal to A:
Ui=1,r (Ai) = A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How many partitions of a set are there?

A

Let A be a finite set with |A| = n. Let r ∈ N, r <= n. Then
the number of partitions of A into r subsets {A1, . . . , Ar} such that |A1| = k1, |A2| = k2 . . . , |Ar| = kr is given by:
n!/k1!k2! . . . kr!.

Proof. Every partition of A satisfying the assumptions can be uniquely determined via the following steps:
(a) Choose A1 ⊆ A such that |A1| = k1. There are nCk1 choices for this step.
(b) Choose A2 ⊆ A such that |A2| = k2 and A1 ∩ A2 = ∅ (which implies that A2 ⊆ A \ A1). There are n−k1Ck2 choices for this step.
(c) Choose A3 ⊆ A such that |A3| = k3, A3 ∩ A2 = ∅ and A3 ∩ A1 = ∅ (which implies that A3 ⊆ A \ (A1 ∪ A2)). There are n−k1−k2Ck3 choices for this step.
.
.
.
r) Finally choose the remaining set Ar ⊆ A such that |Ar| = kr and Ai∩Aj = ∅ for all i = 1, 2, . . . , r − 1 (we see that Ar ⊆ A \ (A1 ∪ · · · ∪ Ar)). There are n−(k1+k2+···+kr−1)Ckr choices for this step.
Thus, by the fundamental counting principle the number of partitions of A into r subsets {A1, A2, . . . , Ar} such that |A1| = k1, . . . , |Ar| = kr with k1 + k2 + · · · + kr = n is
nCk1 * (n − k1)Ck2 * . . . * n − (k1 + · · · + kr−1)Ckr
=n!/k1!(n − k1)! * (n − k1)!/k2!(n − k1 − k2)! * . . . * (n − (k1 + · · · + kr−1))!/kr!(n − (k1 + · · · + kr))!
= n!/k1!k2! . . . kr!,
where the last equality comes by cancellation of the fractions and noticing that
(n − (k1 + · · · + kr))! = (n − n)! = 0! = 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Sample Space?

A

A sample space Ω is the set of all possible outcomes of a random process (or experiment). It can be any set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an Event Space and what are the known Properties of an Event Space?

A

Let Ω be the sample space and F be a collection of subsets of
Ω. F is an event space (also called σ-algebra) if it satisfies
(i) Ω ∈ F.
(ii) if A ⊆ Ω, A ∈ F ⇒ Ac ∈ F (F is closed under compliments).
(iii) if {An : n ∈ N} is such that An ∈ F ∀ n then
Un=1,∞ (An) ∈ F . (F is closed under countable unions.)

Let F be a event space on Ω. Then
(a) F is closed under finite unions.
(b) F is closed under finite intersections.
(c) F is closed under countable intersections.

Proof.
(a) Let A1, . . . , An ∈ F. Set Aj = ∅ ∈ F ∀ j > n, thus An ∈ F ∀ n > 1. Since the empty sets will not contribute to the union, we can show that
Uj=1,n Aj = Uj=1,∞ Aj ∈ F,
as F is closed under countable unions.
(b) Let A1, . . . , An ∈ F. We want to show that ∩j=1,n Aj ∈ F. By De Morgan’s law (show that every element of one set needs to belong to the other set as well)
∩j=1,n Aj = (Uj=1,n Aj^c)^c.
Since the event space F is closed under taking complements, Aj^c ∈ F, for every j = 1, . . . , n. Since it is closed under finite unions (statement 1 of the proposition, shown above),
Uj=1,n Aj^c ∈ F
By taking complement once more, it follows that ∩j=1,n Aj ∈ F.
(c) The proof is similar to that of statement 2 above, noting that De Morgan’s law also holds for countable unions and intersections. That is, we can write
∩j=1,∞ Aj = (Uj=1,∞ Aj^c)^c.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a Probability Measure?

A

Given a sample space Ω and an event space F, a function P : F → R is called a probability measure if it satisfies
(i) P(B) ∈ [0, 1] for every B ∈ F;
(ii) P(Ω) = 1;
(iii) (Countable additivity) For every An ∈ F, n >= 1 disjoint events
P(Un=1,∞ (An)) =Σn=1,∞ (P(An)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a Probability Space?

A

A probability space is defined as the triplet (Ω, F, P), where
* Ω (the sample space) is the set of all possible outcomes of the experiment
(we always assume that it is not empty);
* F is an event space of subsets of Ω.
* P is a probability measure on F.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 3 known properties of a probability space?

A

Let (Ω, F, P) be a probability space. Then, P has the following
properties
(a) If A, B ∈ F such that A ⊆ B, then
P(B − A) = P(B) − P(A).
Note that B − A = B ∩ A^c and is to be interpreted as ‘all elements of B that are not in A.
(b) For every A ∈ F,
P(A^c) = 1 − P(A).
(c) P(∅) = 0.

Proof.
(a) We write B as the union of the set with elements in B that are not in A and those that are, so B = (B − A) ∪ A. By finite additivity
P(B) = P((B − A) ∪ A) = P(B − A) + P(A), which proves the claim.
(b) Using the above property,
P(A^c) = P(Ω − A) = P(Ω) − P(A) = 1 − P(A).
(c) P(∅) = P(Ω^c) = 1 − P(Ω) = 1 − 1 = 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Inclusion - Exclusion Formula?

A

Let (Ω, F, P) be a probability space. Then, for any finite collection A1, . . . , An of events in F, we have
P(Uk=1,n Ak) = Σk=1,n (−1)k−1Σ(1<=i1<···<ik<=n) P(Ai1 ∩ · · · ∩ Aik).

The second summation is embedded

Proof. We will prove the result only for n = 2 (the proof of the induction step in the general case is similar, but messier!). We write
A1 ∪ A2 = (A1 − B) ∪ B ∪ (A2 − B),
where B = A1 ∩ A2. The sets A1 − B, B, A2 − B are all disjoint, so we can write
P(A1 ∪ A2) = P ((A1 − B) ∪ B ∪ (A2 − B)) = P(A1 − B) + P(B) + P(A2 − B)
using finite additivity. We know by proposition 3.2 that P(A1 − B) = P(A1) − P(B) and similarly P(A2 − B) = P(A2) − P(B). Replacing these to the formula above, we get
P(A1 ∪ A2) = (P(A1)−P(B))+P(B)+ (P(A2)−P(B)) = P(A1)+P(A2)−P(B), which proves the claim.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Boole’s Inequality?

A

Let (Ω, F, P) be a probability space. If A1, . . . , An ∈ F, then
P(Ui=1,n (Ai)) <= Σi=1,n P(Ai) (*)

Proof. We proceed by induction. For n = 2, notice that
P(A1 ∪ A2) = P(A1) + P(A2) − P(A1 ∩ A2) <= P(A1) + P(A2).
so that () holds for n = 2. Assume now () holds ∀ j <= n, then we need to prove it holds for n + 1 events. Let A1, . . . , An+1 ∈ F, then arguing as above we have
P(Uj=1,n+1 Aj) = P((Uj=1,n Aj) U An+1)
= P((Uj=1,n Aj) + P(An+1) - P((Uj=1,n Aj) ∩ An+1)
<= P((Uj=1,n Aj) + P(An+1)
<= Σj=1,n P(Aj) + P(An+1) = Σj=1,n+1 P(Aj)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Conditional Probability?

A

Let (Ω, F, P) be a probability space and B ∈ F such that
P(B) > 0. For A ∈ F, the conditional probability of A given B is denoted by P(A|B) and is defined as
P(A|B) = P(A ∩ B)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the Multiplication Rule of conditional probability?

A

(Multiplication Rule). Let (Ω, F, P) be a probability space and
A1, . . . , An ∈ F s.t. P(A1 ∩ · · · ∩ An−1) > 0. Then,
P(A1 ∩ · · · ∩ An) = P(A1)P(A2|A1)P(A3|A1 ∩ A2). . . P(An|A1 ∩ · · · ∩ An−1).

Proof.
Notice that for k = 1, . . . , n − 1, A1 ∩ · · · ∩ Ak ⊃ A1 ∩ · · · ∩ An−1. Hence, by Proposition 3.4 and by assumption
P(A1 ∩ A2 ∩ · · · ∩ Ak) > P(A1 ∩ · · · ∩ An−1) > 0 .
This ensures that all the conditional probabilities at the right hand side are welldefined. The result follows by a direct application of the definition of conditional probability on the right-hand-side:
P(A1)P(A2|A1)P(A3|A1 ∩ A2). . . P(An|A1 ∩ · · · ∩ An−1)
= P(A1) * P(A1 ∩ A2)/P(A1) * P(A3 ∩ A1 ∩ A2)/P(A1 ∩ A2) * . . .
* P(A1 ∩ · · · ∩ An)/P(A1 ∩ · · · ∩ An−1)
= P(A1 ∩ · · · ∩ An).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the Law of Total Probability?

A

Let (Ω, F, P) be a probability space and {Bn : n = 1, . . . , N} with N is either finite or infinite, be a partition of Ω
such that P(Bn) > 0, ∀ n = 1, . . . , N. Then, for all A ∈ F
P(A) = Σn=1,N [P(A|Bn)P(Bn)].

Proof. Notice that since {Bn : n = 1, . . . , N} forms a partition of Ω, we have A = A ∩ Ω = A ∩ Un=1,N Bn
= Un=1,N (A ∩ Bn).
Further, since Bn’s are disjoint so are {A ∩ Bn : n = 1 . . . N}, therefore by finite/countable additivity, we have
P(A) = P(Un=1,N (A ∩ Bn))
= Σn=1,N P(A ∩ Bn)
= Σn=1,N P(A|Bn)P(Bn).
In the last equality we use the definition of conditional probability with the assumption that P(Bn) > 0 ∀ n = 1, . . . , N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Bayes’ Theorem?

A

Let (Ω, F, P) be a probability space and {Bn :
n = 1, . . . , N} with N is either finite or infinite, be a partition of Ω such that P(Bn) > 0 ∀ n = 1, . . . , N. Then for A ∈ F such that P(A) > 0
P(Bn|A) = P(A|Bn)P(Bn)/Σj=1,N P(A|Bj )P(Bj ) ∀ n = 1, . . . , N .

Proof. By definition of conditional probability and since A is such that P(A) > 0 then by the definition of conditional probability and the law of total probability:
P(Bn|A) = P(Bn ∩ A)/P(A)
= P(A|Bn)P(Bn)/P(A)
= P(A|Bn)P(Bn)/Σj=1,N P(A|Bj )P(Bj )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When are two events Independent?

A

Let (Ω, F, P) be a probability space. We say that events A and
B are independent if P(A ∩ B) = P(A) · P(B).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does it mean for events to be pairwise and mutually independent?

A

Let (Ω, F, P) be a probability space and A1, A2, . . . , An be
events. We say that the events A1, . . . , An are pairwise independent if Aj and Ak are independent for every choice of j and k distinct. We say that the events A1, . . . , An are mutually independent, if
P(Aj1 ∩ Aj2 ∩ · · · ∩ Ajk) = P(Aj1)P(Aj2)· · · P(Ajk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a Random Variable?

A

Let (Ω, F, P) be a probability space. A random variable is a function X : Ω → R such that {ω ∈ Ω : X(ω) <= a} ∈ F for every a ∈ R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the Distribution of a random variable?

A

The distribution of a random variable X is the probability measure on R denoted PX and given by
PX(B) = P({ω ∈ Ω : X(ω) ∈ B})
for subsets B in some event space on the set of real numbers.

We also have that PX is a probability measure on R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a Probability Mass Function?

A

Let (Ω, F, P) be a probability space and X be a discrete random variable. We define the probability mass function of X as the function pX : R → [0, 1] given by
pX(x) = PX({x}).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the Discrete Support?

A

Let (Ω, F, P) be a probability space and X be a discrete random variable. We define the discrete support of X, or, more precisely, the discrete support of its distribution PX, as the set
{x ∈ R : pX(x) > 0}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a Bernoulli Distribution?

A

We say that a discrete random variable X has Bernoulli distribution with parameter p ∈ [0, 1], denoted X ∼ Bernoulli(p), if its probability mass function is
pX(x) = p, x = 1,
1 − p, x = 0,
0, otherwise.

30
Q

What is a Geometric Distribution?

A

We say that a discrete random variable X has geometric distribution with parameter p ∈ (0, 1], denoted X ∼ Geom(p), if its probability mass function is
pX(x) = p · (1 − p)^x−1 , x ∈ N,
0, otherwise.

31
Q

What is a Binomial Distribution?

A

We say that a discrete random variable X has binomial distribution with parameters n ∈ N0 and p ∈ [0, 1], denoted X ∼ Binom(n, p), if its probability mass function is
pX(x) = (nCx)(p^x)(1 − p)^n−x, x ∈ {0, 1, . . . , n},
0, otherwise

32
Q

What is a Poisson Distribution?

A

We say that a discrete random variable X has Poisson distribution with parameter λ > 0, denoted by X ∼ Poisson(n, p), if its probability mass function is
pX(x) = (e^−λ)(λ^x)/x!, x ∈ N0,
0, otherwise.

33
Q

What is Expectation?

A

Let X be a discrete random variable. We define the expectation of X, denoted E[X], as the real number given by
E[X] = Σx:P(X=x)>0 [x · P(X = x)],
as long as this sum converges absolutely, otherwise E[X] is not defined.

34
Q

When is a random variable Integrable?

A

We say that a discrete random variable X is integrable if is expectation is defined

35
Q

How does expectation sum?

A

E[X + Y ] = E[X] + E[Y]

36
Q

What are the properties of expectation?

A

Let X and Y be integrable discrete random variables defined on the same probability space (Ω, F, P). Then:
(1) E[1A] = P(A) for every A ∈ F,
(2) If 0 <= Z <= X for all ω ∈ Ω then 0 <= E[Z] <= E[X],
(3) E[aX + bY ] = aE[X] + bE[Y ].

37
Q

What is a Function of a random variable?

A

Let X be a discrete random variable, and let g : R → R be
any function. Then
E[g(X)] = Σx∈DX [g(x) · P(X = x)],

38
Q

When is a random variable Square Integrable?

A

We say that a discrete random variable X is square-integrable if X^2 is integrable, which means that the
sum
Σx:P(X=x)>0 [x^2 · P(X = x)]
is convergent. Note that square-integrable random variables are automatically integrable, because |x| <= 1 + x^2
.

39
Q

What is the Variance of a random variable?

A

Let X be a square-integrable discrete random variable and denote µ = E[X]. We define the variance of X as
Var(X) = E[(X − µ)^2].

also,
Var(X) = E[X^2] − (E[X])^2

40
Q

What is the Standard Deviation of a random variable?

A

Let X be a square-integrable discrete random variable. We define the standard deviation of X as
σ(X) = sqrt[Var(X)].
Unlike the variance, the standard deviation satisfies σ(aX) = |a| · σ(X).

41
Q

What is the Joint Probability Mass Function?

A

Given two discrete random variables X and Y , we define the joint probability mass function of X and Y, denoted pX,Y : R^2 → R, and given by
pX,Y (x, y) = P(X = x, Y = y),
or, more formally, P({ω ∈ Ω : X(ω) = x, Y (ω) = y}).

42
Q

What is the Marginal Probability Mass Function?

A

pX(x) = Σy∈DY [pX,Y (x, y)]

43
Q

What is Expectation in the Bivariate case?

A

Let X and Y be discrete random variables, and let g : R
2 → R be any function. Then
E[g(X, Y )] = Σx∈DX Σy∈DY [g(x, y) · P(X = x, Y = y)]

44
Q

When are two random variables Independent?

A

Two discrete random variables X and Y are independent if
pX,Y (x, y) = pX(x) · pY(y)
for every x, y ∈ R.

45
Q

When are two random variables Pairwise Independent?

A

We say that a collection of discrete random variables X1, X2, X3, . . . is pairwise independent if Xj and Xk are
independent for every j /= k.

46
Q

When are two random variables Mutually Independent?

A

We say that a collection of discrete random variables X1, X2, X3, . . . is mutually independent if, for every k and every x1, x2, . . . , xk, we have
P(X1 = x1, X2 = x2, . . . , Xk = xk) = P(X1 = x1) · P(X2 = x2)· · · P(Xk = xk).

47
Q

How does the expectation of two independent random variables multiply?

A

If X and Y are independent integrable discrete random variables, then XY is integrable and
E[XY ] = E[X] · E[Y ]

Proof
E[XY ] = Σx∈DX Σy∈DY [xy · P(X = x, Y = y)]
= Σx∈DX x · ( Σy∈DY [y · P(X = x)P(Y = y)])
= (Σx∈DX [x · P(X = x)]) * (Σy∈DY [y · P(Y = y)])
= E[X] · E[Y ].
In order to use Proposition 7.1 we should have known that XY was integrable in the first place. This can be checked using exactly the same development with |x| instead of x and |y| instead of y. This proves the theorem.

48
Q

What is the Law of Averages?

A

Let X1, X2, X3, . . . be pairwise independent square-integrable discrete random variables with the same mean µ and same variance σ^2. Then, for every a > 0, and n ∈ N,
P(µ − a <= (X1 + · · · + Xn)/n <= µ + a) > 1 − σ^2/(a^2 n).

Where µ = E[X]

from this we also get,
P(|1/n * ΣXi - E[X]| >= a) <= Var(X1)/n*a^2

49
Q

What is Covariance?

A

Suppose X and Y are square-integrable discrete random variables. We define the covariance of X and Y as
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])].

also,
Cov(X, Y ) = E[XY ] − E[X] · E[Y ].

50
Q

What are the properties of covariance?

A

Cov(X, Y ) = Cov(Y, X)
Cov(X, X) = Var(X)
Cov(aX + bY, Z) = a Cov(X, Z) + b Cov(Y, Z)

Proof of last. Just expand and group:
Cov(aX + bY, Z) = E[((aX + bY ) − E[aX + bY ])(Z − E[Z])]
= E[(aX − E[aX] + bY − E[bY ])(Z − E[Z])]
= E[(aX − E[aX])(Z − E[Z]) + (bY − E[bY ])(Z − E[Z])]
= a E[(X − E[X])(Z − E[Z])] + b E[(Y − E[Y ])(Z − E[Z])]
= a Cov(X, Z) + b Cov(Y, Z),

51
Q

When are random variables Uncorrelated?

A

We say that a collection of square-integrable discrete random variables X1, X2, X3, . . . is uncorrelated if Cov(Xj , Xk) = 0 for every j /= k.

52
Q

What is Markov’s Inequality?

A

Let X be an integrable non-negative discrete random variable. Then,
P(X > x) <= E[X]/x
for every x > 0.

Proof. Fix x > 0. Define the random variable
Y := x if X > x; 0 otherwise.
We have that X > Y , because:
* if X > x, then Y = x, so X > Y ;
* if X ∈ [0, x), then Y = 0, so X > Y .
This also gives E[X] > E[Y ]. Next, note that Y is a discrete random variable (it
only attains the values 0 and x) with
pY (x) = P(X > x), pY (0) = P(X < x).
74
Hence,
E[X] > E[Y ] = 0 · pY (0) + x · pY (x) = x · pY (x) = x · P(X > x).
Rearranging this, we obtain the desired inequality

53
Q

What is Chebyshev’s Inequality?

A

Let X be a square-integrable discrete random variable. Then,
P(|X − E[X]| > a) <= Var(X)/a^2
for every a > 0.

Proof. Let a > 0. Define Y := (X − E[X])^2. Then, Y is non-negative and
E[Y ] = E[(X − E[X])^2] = Var(X).
In particular, Y is integrable. Next, note that the following events are the same:
{|X − E[X]| >= a} = {(X − E[X])^2 >= a^2} = {Y >= a^2}.
Hence, by Markov’s inequality we have
P(|X − E[X]| >= a) = P(Y >= a^2)
<= E[Y ]/a^2
= Var(X)/a^2.
This concludes the proof of the theorem.

54
Q

What is the Correlation Coefficient and how is it linear?

A

Let X and Y be square-integrable discrete random variables with positive variance. The correlation coefficient between X
and Y is defined as
ρ(X, Y ) = Cov(X, Y )/σ(X) · σ(Y ).

We also have linearity of the coefficient such that,
ρ(aX + b, cY + d) = ρ(X, Y)

Proof. We first note that the covariance between a constant and any other random variable is equal to zero. Indeed,
Cov(b, Y ) = E[bY ] − E[b] · E[Y ] = b · E[Y ] − b · E[Y ] = 0.
Therefore,
Cov(aX + b, cY + d) = ac · Cov(X, Y ).
Substituting this into the formula for the correlation coefficient,
ρ(aX + b, cY + d) = Cov(aX + b, cY + d)/σ(aX + b) · σ(cY + d)
= ac · Cov(X, Y )/ a · σ(X) · c · σ(Y )
= Cov(X, Y )/ σ(X) · σ(Y )
= ρ(X, Y ).

55
Q

What is the Central Limit Theorem?

A

Let X1, X2, X3, . . . be mutually independent square-integrable discrete random variables with the same distribution. Denote their mean by µ and variance by σ^2 > 0. Then, for every a < b
P(a <= (X1 + · · · + Xn − n · µ)/σ · √n <= b) ≈ ∫a,b [1/√2π * e^−x^2/2 dx.

56
Q

What is a Continuous Random Variable?

A

We say that a random variable X is continuous with probability density function fX : R → R if
P(a <= X <= b) = ∫a,b fX(x) dx
for every a < b ∈ R.

57
Q

What is the Uniform Continuous Distribution?

A

Let a, b ∈ R with a < b. A random variable X has the (continuous) uniform distribution on (a, b) if it has
fX(x) = 1/b−a if a < x < b,
0 otherwise.
We write X ∼ U(a, b).

58
Q

What is the Normal Distribution?

A

Let µ ∈ R and σ > 0. A random variable X has normal (or Gaussian) distribution with parameters µ and σ^2
if it has probability density function given by
fX(x) = 1/√(2πσ^2) · e^−[(x−µ)^2/2σ^2] for every x ∈ R. We write X ∼ N (µ, σ^2).

59
Q

What is the Exponential Distribution?

A

Let λ > 0. A random variable X has the exponential distribution with parameter λ if
fX(x) = λe^−λx if x > 0,
0 if x <= 0.
We write X ∼ Exp(λ).

60
Q

What is the Expectation of a Continuous Random Variable?

A

Let X be a continuous random variable with density fX. We define the expectation of X, denoted E[X], as the real number
given by
E[X] = ∫-∞,+∞ x fX(x) dx

Variance works similarly

61
Q

What is the Cumulative Distribution Function for a continuous random variable?

A

Let X be a random variable. The cumulative distribution
function of X is the function FX : R → [0, ∞) defined by
FX(x) = P(X <= x) for every x ∈ R.

62
Q

What is Joint Density?

A

Two random variables X and Y defined in the same probability space are called jointly continuous with joint probability density function fX,Y : R^2 → R+ if
P(a1 <= X <= a2, b1 <= Y <= b2) = ∫a1,a2 ∫b1,b2 [fX,Y (x, y)] dydx
for every a1 < a2 and b1 < b2.

63
Q

What is the Joint Cumulative Distribution Function?

A

Let X, Y be random variables. The joint cumulative distribution function of (X, Y ) is the function FX,Y : R^2 → [0, 1] given by
FX,Y (x, y) = P(X <= x, Y <= y)
for every x, y ∈ R.

64
Q

What is the Moment Generating Function?

A

Given a random variable X, we define the moment generating function of X as the function MX given by
MX(t) = E[e^tX]
for the values of t for which e^tX is integrable.

65
Q

What is a Moment?

A

We define the k-th moment of a random variable
X as E[X^k] if X^k is integrable.

66
Q

What is the Expectation for the Distributions?

A

Poission - λ
Binomial - np
Bernoulli - p
Geometric - 1/p

67
Q

What is the Variance for the distributions?

A

Poission - λ
Binomial - np*(1-p)
Bernoulli - p(1-p)
Geometric - p-1/p^2

68
Q

What is Theorem 17.1?

A

(Moment generating function determines the distribution).
Given two random variables X and Y , if there exists a > 0 such that MX(t) and MY (t) are finite and coincide for every x ∈ [−a, a], then X and Y have the same distribution.

69
Q

What is the Fundamental Counting Principle?

A

Suppose that the elements of a finite set E can be determined in k successive steps, with n1 possible choices for step 1, n2 possible choices for step 2, . . . , nk possible choices for step k. Suppose also that different choices lead to different elements. Then
|E| = n1 · n2 · · · · · nk .

70
Q

What is the Addition Rule?

A

If A1, . . . , An are pairwise disjoint subsets of some set then
|Ui=1,n Ai| = Σi=1,n |Ai| .

71
Q

When is the probability of an event less than the probability of another?

A

Let (Ω, F, P) be a probability space. If A, B ∈ F and A ⊆ B,
then
P(A) <= P(B).

Proof. Since A ⊆ B, it follows that P(B − A) = P(B) − P(A) or, equivalently,
P(A) = P(B) − P(B − A) <= P(B), where the inequality follows from the fact that probabilities are always non-negative.

72
Q

How do we find Expectation and Variance using moments?

A

We have that E[X^k] = M^(k)X (0),
that is the kth moment is equal to the kth derivative of the moment generating function, evaluated at 0.

We can the find E[X] and E[X^2] to get expectation and variance.