Stats Flashcards

1
Q

What is a UNIFORM Probability Space?

A

A uniform probability space is defined as the triplet (Ω, F, P),
where
* Ω (the sample space) is a non-empty finite set of all possible outcomes of the experiment;
* F (the event space) is the collection of all events, given by the power-set P(Ω) of Ω;
* P : F → [0, 1] is a map from the event space to [0, 1], satisfying:
− P(∅) = 0, P(Ω) = 1
− P(A ∪ B) = P(A) + P(B), for every A, B ∈ F such that A ∩ B = ∅
(finite additivity)
− P({ω}) = P({ω˜}) for all ω, ω˜ ∈ Ω (uniform)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we calculate probability in a uniform probability space?

A

Let (Ω, F, P) be a uniform probability space. Then for all ω ∈ Ω,
P({ω}) = 1/|Ω|
and for all A ⊆ Ω (A ∈ F),
P(A) = |A|/|Ω|.

Proof,
Since ∀ ω1, ω2 ∈ Ω, P({ω1}) = P({ω2}), let p ∈ [0, 1] s.t.
p := P({ω}) ∀ ω ∈ Ω .
Since P is a probability measure
1 = P(Ω) = Σω∈Ω P({w}) by (1.1b)
= Σω∈Ω p
= pΣω∈Ω 1
= p|Ω|.
Therefore
p = 1/|Ω|.
showing (1.2). We see that (1.3) follows from (1.1b), as
P(A) = Σω∈A P({ω}), by (1.1b)
= Σω∈A p
= pΣω∈A 1
= p|A| = |A|/|Ω|.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are Sequences of a set?

A

Let A be a finite set of cardinality n ∈ N. A sequence of length
k ∈ N of elements of A is an ordered k-tuple (a1, . . . , ak) s.t. ai ∈ A, i = 1, 2, . . . k.
We denote by Sn,k(A) the set of all sequences of length k of elements of A.

By “ordered” we mean that the order of the sequence matters, for example:
(a1, a2) /= (a2, a1). Note also that repetitions of elements are allowed, for example
(1, 1) is a sequence of length 2 of elements of {1}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Cardinality of Sn,k?

A

Let A be a finite set of cardinality n ∈ N. The set Sn,k(A) of
all sequences of length k ∈ N of elements of A has cardinality n^k, i.e.
|Sn,k(A)| = n^k

Proof,
To construct an arbitrary element (a1, a2, . . . , ak) of Sn,k(A), we perform the following steps
(a) choose the first value a1. There are n1 = |A| = n ways to do this.
(b) choose the second value a2. There are n2 = |A| = n ways to do this.
.
.
.
k) I choose the kth value ak. There are nk = |A| = n ways to do this.
Thus we find
|Sn,k(A)| = n1 · n2 · · · · · nk = n · n · · · · · n = n^k.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an Ordering of a set?

A

Let A be a finite set of cardinality n ∈ N and let k ∈ N such
that k <= n. An ordering of length k of elements of A is a sequence of length k of elements of A with no repetitions. We denote the set of orderings of length k of elements of A by On,k(A). Thus we have
On,k(A) = {(a1, . . . , ak) : ai ∈ A ∀i = 1, . . . , k, ai /= aj ∀ i /= j}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Cardinality of On,k?

A

Let A be a finite set of cardinality n ∈ N and k <= n. Then
|On,k(A)| = n(n − 1). . .(n − k + 1).

Proof. We determine an element of On,k(A) by the following steps:
(a) We choose a1. There are n1 = |A| = n choices for this.
(b) We choose a2 such that a2 /= a1. There are n2 = n − 1 choices for this.
(c) We choose a3 such that a3 /= a2 and a3 /= a1. There are n3 = n − 2 choices
for this.
.
.
.
k) We choose ak such that ak /= ai ∀ i = 1, 2, . . . , k − 1. There are nk = n − (k − 1) choices for this.
By the fundamental counting principle
|On,k(A)| = n1n2 . . . nk = n(n − 1). . .(n − k + 1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Combination of a set?

A

Let A be a finite set of cardinality n ∈ N. A combination of k
elements of A is a subset of A with k elements. We denote by Cn,k(A) the set of combinations of k elements of A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Cardinality of Cn,k?

A

Let A be a finite set of cardinality n ∈ N and k <= n. Then
|Cn,k(A)| = nCk = n!/[k!(n − k)!].

Proof. Notice that an ordering of length k of elements of A can be obtained uniquely by the following steps
(a) Choose a combination of k elements of A. There are n1 = |Cn,k(A)| choices for this.
(b) Choose a permutation of these elements. By Corollary 2.1 there are n2 = k! choices for this.
By the fundamental counting principle,
|On,k(A)| = |Cn,k(A)| · k!.
The above equation can be solved for the only unknown term, giving
|Cn,k(A)| = |On,k(A)|/ k! = n!/(n − k)!k!
= nCk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Permutation and how many permutations of a set are there?

A

Let A be a finite set of cardinality n ∈ N. An ordering of length
n of elements of A is called a permutation of A.

Let A be a finite set of cardinality n ∈ N. Then the number of
permutations of the elements of A is n(n − 1)(n − 2). . . 1 = n!.

Proof, just take k=n as an ordering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Partition of a set?

A

Let A be a set of cardinality n ∈ N and r ∈ N such that r <= n.
A partition of A into r subsets is a family {A1, . . . , Ar} of subsets of A such that
(a) Every subset in the family is non-empty: Ai 6= ∅ ∀ i = 1, 2, . . . , r.
(b) The subsets in the family are pairwise disjoint: Ai ∩ Aj = ∅ ∀ i /= j, i, j = 1, 2, . . . , r.
(c) The union of all the subsets in the family is equal to A:
Ui=1,r (Ai) = A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How many partitions of a set are there?

A

Let A be a finite set with |A| = n. Let r ∈ N, r <= n. Then
the number of partitions of A into r subsets {A1, . . . , Ar} such that |A1| = k1, |A2| = k2 . . . , |Ar| = kr is given by:
n!/k1!k2! . . . kr!.

Proof. Every partition of A satisfying the assumptions can be uniquely determined via the following steps:
(a) Choose A1 ⊆ A such that |A1| = k1. There are nCk1 choices for this step.
(b) Choose A2 ⊆ A such that |A2| = k2 and A1 ∩ A2 = ∅ (which implies that A2 ⊆ A \ A1). There are n−k1Ck2 choices for this step.
(c) Choose A3 ⊆ A such that |A3| = k3, A3 ∩ A2 = ∅ and A3 ∩ A1 = ∅ (which implies that A3 ⊆ A \ (A1 ∪ A2)). There are n−k1−k2Ck3 choices for this step.
.
.
.
r) Finally choose the remaining set Ar ⊆ A such that |Ar| = kr and Ai∩Aj = ∅ for all i = 1, 2, . . . , r − 1 (we see that Ar ⊆ A \ (A1 ∪ · · · ∪ Ar)). There are n−(k1+k2+···+kr−1)Ckr choices for this step.
Thus, by the fundamental counting principle the number of partitions of A into r subsets {A1, A2, . . . , Ar} such that |A1| = k1, . . . , |Ar| = kr with k1 + k2 + · · · + kr = n is
nCk1 * (n − k1)Ck2 * . . . * n − (k1 + · · · + kr−1)Ckr
=n!/k1!(n − k1)! * (n − k1)!/k2!(n − k1 − k2)! * . . . * (n − (k1 + · · · + kr−1))!/kr!(n − (k1 + · · · + kr))!
= n!/k1!k2! . . . kr!,
where the last equality comes by cancellation of the fractions and noticing that
(n − (k1 + · · · + kr))! = (n − n)! = 0! = 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Sample Space?

A

A sample space Ω is the set of all possible outcomes of a random process (or experiment). It can be any set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an Event Space and what are the known Properties of an Event Space?

A

Let Ω be the sample space and F be a collection of subsets of
Ω. F is an event space (also called σ-algebra) if it satisfies
(i) Ω ∈ F.
(ii) if A ⊆ Ω, A ∈ F ⇒ Ac ∈ F (F is closed under compliments).
(iii) if {An : n ∈ N} is such that An ∈ F ∀ n then
Un=1,∞ (An) ∈ F . (F is closed under countable unions.)

Let F be a event space on Ω. Then
(a) F is closed under finite unions.
(b) F is closed under finite intersections.
(c) F is closed under countable intersections.

Proof.
(a) Let A1, . . . , An ∈ F. Set Aj = ∅ ∈ F ∀ j > n, thus An ∈ F ∀ n > 1. Since the empty sets will not contribute to the union, we can show that
Uj=1,n Aj = Uj=1,∞ Aj ∈ F,
as F is closed under countable unions.
(b) Let A1, . . . , An ∈ F. We want to show that ∩j=1,n Aj ∈ F. By De Morgan’s law (show that every element of one set needs to belong to the other set as well)
∩j=1,n Aj = (Uj=1,n Aj^c)^c.
Since the event space F is closed under taking complements, Aj^c ∈ F, for every j = 1, . . . , n. Since it is closed under finite unions (statement 1 of the proposition, shown above),
Uj=1,n Aj^c ∈ F
By taking complement once more, it follows that ∩j=1,n Aj ∈ F.
(c) The proof is similar to that of statement 2 above, noting that De Morgan’s law also holds for countable unions and intersections. That is, we can write
∩j=1,∞ Aj = (Uj=1,∞ Aj^c)^c.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a Probability Measure?

A

Given a sample space Ω and an event space F, a function P : F → R is called a probability measure if it satisfies
(i) P(B) ∈ [0, 1] for every B ∈ F;
(ii) P(Ω) = 1;
(iii) (Countable additivity) For every An ∈ F, n >= 1 disjoint events
P(Un=1,∞ (An)) =Σn=1,∞ (P(An)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a Probability Space?

A

A probability space is defined as the triplet (Ω, F, P), where
* Ω (the sample space) is the set of all possible outcomes of the experiment
(we always assume that it is not empty);
* F is an event space of subsets of Ω.
* P is a probability measure on F.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 3 known properties of a probability space?

A

Let (Ω, F, P) be a probability space. Then, P has the following
properties
(a) If A, B ∈ F such that A ⊆ B, then
P(B − A) = P(B) − P(A).
Note that B − A = B ∩ A^c and is to be interpreted as ‘all elements of B that are not in A.
(b) For every A ∈ F,
P(A^c) = 1 − P(A).
(c) P(∅) = 0.

Proof.
(a) We write B as the union of the set with elements in B that are not in A and those that are, so B = (B − A) ∪ A. By finite additivity
P(B) = P((B − A) ∪ A) = P(B − A) + P(A), which proves the claim.
(b) Using the above property,
P(A^c) = P(Ω − A) = P(Ω) − P(A) = 1 − P(A).
(c) P(∅) = P(Ω^c) = 1 − P(Ω) = 1 − 1 = 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Inclusion - Exclusion Formula?

A

Let (Ω, F, P) be a probability space. Then, for any finite collection A1, . . . , An of events in F, we have
P(Uk=1,n Ak) = Σk=1,n (−1)k−1Σ(1<=i1<···<ik<=n) P(Ai1 ∩ · · · ∩ Aik).

The second summation is embedded

Proof. We will prove the result only for n = 2 (the proof of the induction step in the general case is similar, but messier!). We write
A1 ∪ A2 = (A1 − B) ∪ B ∪ (A2 − B),
where B = A1 ∩ A2. The sets A1 − B, B, A2 − B are all disjoint, so we can write
P(A1 ∪ A2) = P ((A1 − B) ∪ B ∪ (A2 − B)) = P(A1 − B) + P(B) + P(A2 − B)
using finite additivity. We know by proposition 3.2 that P(A1 − B) = P(A1) − P(B) and similarly P(A2 − B) = P(A2) − P(B). Replacing these to the formula above, we get
P(A1 ∪ A2) = (P(A1)−P(B))+P(B)+ (P(A2)−P(B)) = P(A1)+P(A2)−P(B), which proves the claim.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Boole’s Inequality?

A

Let (Ω, F, P) be a probability space. If A1, . . . , An ∈ F, then
P(Ui=1,n (Ai)) <= Σi=1,n P(Ai) (*)

Proof. We proceed by induction. For n = 2, notice that
P(A1 ∪ A2) = P(A1) + P(A2) − P(A1 ∩ A2) <= P(A1) + P(A2).
so that () holds for n = 2. Assume now () holds ∀ j <= n, then we need to prove it holds for n + 1 events. Let A1, . . . , An+1 ∈ F, then arguing as above we have
P(Uj=1,n+1 Aj) = P((Uj=1,n Aj) U An+1)
= P((Uj=1,n Aj) + P(An+1) - P((Uj=1,n Aj) ∩ An+1)
<= P((Uj=1,n Aj) + P(An+1)
<= Σj=1,n P(Aj) + P(An+1) = Σj=1,n+1 P(Aj)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Conditional Probability?

A

Let (Ω, F, P) be a probability space and B ∈ F such that
P(B) > 0. For A ∈ F, the conditional probability of A given B is denoted by P(A|B) and is defined as
P(A|B) = P(A ∩ B)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the Multiplication Rule of conditional probability?

A

(Multiplication Rule). Let (Ω, F, P) be a probability space and
A1, . . . , An ∈ F s.t. P(A1 ∩ · · · ∩ An−1) > 0. Then,
P(A1 ∩ · · · ∩ An) = P(A1)P(A2|A1)P(A3|A1 ∩ A2). . . P(An|A1 ∩ · · · ∩ An−1).

Proof.
Notice that for k = 1, . . . , n − 1, A1 ∩ · · · ∩ Ak ⊃ A1 ∩ · · · ∩ An−1. Hence, by Proposition 3.4 and by assumption
P(A1 ∩ A2 ∩ · · · ∩ Ak) > P(A1 ∩ · · · ∩ An−1) > 0 .
This ensures that all the conditional probabilities at the right hand side are welldefined. The result follows by a direct application of the definition of conditional probability on the right-hand-side:
P(A1)P(A2|A1)P(A3|A1 ∩ A2). . . P(An|A1 ∩ · · · ∩ An−1)
= P(A1) * P(A1 ∩ A2)/P(A1) * P(A3 ∩ A1 ∩ A2)/P(A1 ∩ A2) * . . .
* P(A1 ∩ · · · ∩ An)/P(A1 ∩ · · · ∩ An−1)
= P(A1 ∩ · · · ∩ An).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the Law of Total Probability?

A

Let (Ω, F, P) be a probability space and {Bn : n = 1, . . . , N} with N is either finite or infinite, be a partition of Ω
such that P(Bn) > 0, ∀ n = 1, . . . , N. Then, for all A ∈ F
P(A) = Σn=1,N [P(A|Bn)P(Bn)].

Proof. Notice that since {Bn : n = 1, . . . , N} forms a partition of Ω, we have A = A ∩ Ω = A ∩ Un=1,N Bn
= Un=1,N (A ∩ Bn).
Further, since Bn’s are disjoint so are {A ∩ Bn : n = 1 . . . N}, therefore by finite/countable additivity, we have
P(A) = P(Un=1,N (A ∩ Bn))
= Σn=1,N P(A ∩ Bn)
= Σn=1,N P(A|Bn)P(Bn).
In the last equality we use the definition of conditional probability with the assumption that P(Bn) > 0 ∀ n = 1, . . . , N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Bayes’ Theorem?

A

Let (Ω, F, P) be a probability space and {Bn :
n = 1, . . . , N} with N is either finite or infinite, be a partition of Ω such that P(Bn) > 0 ∀ n = 1, . . . , N. Then for A ∈ F such that P(A) > 0
P(Bn|A) = P(A|Bn)P(Bn)/Σj=1,N P(A|Bj )P(Bj ) ∀ n = 1, . . . , N .

Proof. By definition of conditional probability and since A is such that P(A) > 0 then by the definition of conditional probability and the law of total probability:
P(Bn|A) = P(Bn ∩ A)/P(A)
= P(A|Bn)P(Bn)/P(A)
= P(A|Bn)P(Bn)/Σj=1,N P(A|Bj )P(Bj )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When are two events Independent?

A

Let (Ω, F, P) be a probability space. We say that events A and
B are independent if P(A ∩ B) = P(A) · P(B).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does it mean for events to be pairwise and mutually independent?

A

Let (Ω, F, P) be a probability space and A1, A2, . . . , An be
events. We say that the events A1, . . . , An are pairwise independent if Aj and Ak are independent for every choice of j and k distinct. We say that the events A1, . . . , An are mutually independent, if
P(Aj1 ∩ Aj2 ∩ · · · ∩ Ajk) = P(Aj1)P(Aj2)· · · P(Ajk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is a Random Variable?
Let (Ω, F, P) be a probability space. A random variable is a function X : Ω → R such that {ω ∈ Ω : X(ω) <= a} ∈ F for every a ∈ R.
26
What is the Distribution of a random variable?
The distribution of a random variable X is the probability measure on R denoted PX and given by PX(B) = P({ω ∈ Ω : X(ω) ∈ B}) for subsets B in some event space on the set of real numbers. We also have that PX is a probability measure on R.
27
What is a Probability Mass Function?
Let (Ω, F, P) be a probability space and X be a discrete random variable. We define the probability mass function of X as the function pX : R → [0, 1] given by pX(x) = PX({x}).
28
What is the Discrete Support?
Let (Ω, F, P) be a probability space and X be a discrete random variable. We define the discrete support of X, or, more precisely, the discrete support of its distribution PX, as the set {x ∈ R : pX(x) > 0}.
29
What is a Bernoulli Distribution?
We say that a discrete random variable X has Bernoulli distribution with parameter p ∈ [0, 1], denoted X ∼ Bernoulli(p), if its probability mass function is pX(x) = p, x = 1, 1 − p, x = 0, 0, otherwise.
30
What is a Geometric Distribution?
We say that a discrete random variable X has geometric distribution with parameter p ∈ (0, 1], denoted X ∼ Geom(p), if its probability mass function is pX(x) = p · (1 − p)^x−1 , x ∈ N, 0, otherwise.
31
What is a Binomial Distribution?
We say that a discrete random variable X has binomial distribution with parameters n ∈ N0 and p ∈ [0, 1], denoted X ∼ Binom(n, p), if its probability mass function is pX(x) = (nCx)(p^x)(1 − p)^n−x, x ∈ {0, 1, . . . , n}, 0, otherwise
32
What is a Poisson Distribution?
We say that a discrete random variable X has Poisson distribution with parameter λ > 0, denoted by X ∼ Poisson(n, p), if its probability mass function is pX(x) = (e^−λ)(λ^x)/x!, x ∈ N0, 0, otherwise.
33
What is Expectation?
Let X be a discrete random variable. We define the expectation of X, denoted E[X], as the real number given by E[X] = Σx:P(X=x)>0 [x · P(X = x)], as long as this sum converges absolutely, otherwise E[X] is not defined.
34
When is a random variable Integrable?
We say that a discrete random variable X is integrable if is expectation is defined
35
How does expectation sum?
E[X + Y ] = E[X] + E[Y]
36
What are the properties of expectation?
Let X and Y be integrable discrete random variables defined on the same probability space (Ω, F, P). Then: (1) E[1A] = P(A) for every A ∈ F, (2) If 0 <= Z <= X for all ω ∈ Ω then 0 <= E[Z] <= E[X], (3) E[aX + bY ] = aE[X] + bE[Y ].
37
What is a Function of a random variable?
Let X be a discrete random variable, and let g : R → R be any function. Then E[g(X)] = Σx∈DX [g(x) · P(X = x)],
38
When is a random variable Square Integrable?
We say that a discrete random variable X is square-integrable if X^2 is integrable, which means that the sum Σx:P(X=x)>0 [x^2 · P(X = x)] is convergent. Note that square-integrable random variables are automatically integrable, because |x| <= 1 + x^2 .
39
What is the Variance of a random variable?
Let X be a square-integrable discrete random variable and denote µ = E[X]. We define the variance of X as Var(X) = E[(X − µ)^2]. also, Var(X) = E[X^2] − (E[X])^2
40
What is the Standard Deviation of a random variable?
Let X be a square-integrable discrete random variable. We define the standard deviation of X as σ(X) = sqrt[Var(X)]. Unlike the variance, the standard deviation satisfies σ(aX) = |a| · σ(X).
41
What is the Joint Probability Mass Function?
Given two discrete random variables X and Y , we define the joint probability mass function of X and Y, denoted pX,Y : R^2 → R, and given by pX,Y (x, y) = P(X = x, Y = y), or, more formally, P({ω ∈ Ω : X(ω) = x, Y (ω) = y}).
42
What is the Marginal Probability Mass Function?
pX(x) = Σy∈DY [pX,Y (x, y)]
43
What is Expectation in the Bivariate case?
Let X and Y be discrete random variables, and let g : R 2 → R be any function. Then E[g(X, Y )] = Σx∈DX Σy∈DY [g(x, y) · P(X = x, Y = y)]
44
When are two random variables Independent?
Two discrete random variables X and Y are independent if pX,Y (x, y) = pX(x) · pY(y) for every x, y ∈ R.
45
When are two random variables Pairwise Independent?
We say that a collection of discrete random variables X1, X2, X3, . . . is pairwise independent if Xj and Xk are independent for every j /= k.
46
When are two random variables Mutually Independent?
We say that a collection of discrete random variables X1, X2, X3, . . . is mutually independent if, for every k and every x1, x2, . . . , xk, we have P(X1 = x1, X2 = x2, . . . , Xk = xk) = P(X1 = x1) · P(X2 = x2)· · · P(Xk = xk).
47
How does the expectation of two independent random variables multiply?
If X and Y are independent integrable discrete random variables, then XY is integrable and E[XY ] = E[X] · E[Y ] Proof E[XY ] = Σx∈DX Σy∈DY [xy · P(X = x, Y = y)] = Σx∈DX x · ( Σy∈DY [y · P(X = x)P(Y = y)]) = (Σx∈DX [x · P(X = x)]) * (Σy∈DY [y · P(Y = y)]) = E[X] · E[Y ]. In order to use Proposition 7.1 we should have known that XY was integrable in the first place. This can be checked using exactly the same development with |x| instead of x and |y| instead of y. This proves the theorem.
48
What is the Law of Averages?
Let X1, X2, X3, . . . be pairwise independent square-integrable discrete random variables with the same mean µ and same variance σ^2. Then, for every a > 0, and n ∈ N, P(µ − a <= (X1 + · · · + Xn)/n <= µ + a) > 1 − σ^2/(a^2 n). Where µ = E[X] from this we also get, P(|1/n * ΣXi - E[X]| >= a) <= Var(X1)/n*a^2
49
What is Covariance?
Suppose X and Y are square-integrable discrete random variables. We define the covariance of X and Y as Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]. also, Cov(X, Y ) = E[XY ] − E[X] · E[Y ].
50
What are the properties of covariance?
Cov(X, Y ) = Cov(Y, X) Cov(X, X) = Var(X) Cov(aX + bY, Z) = a Cov(X, Z) + b Cov(Y, Z) Proof of last. Just expand and group: Cov(aX + bY, Z) = E[((aX + bY ) − E[aX + bY ])(Z − E[Z])] = E[(aX − E[aX] + bY − E[bY ])(Z − E[Z])] = E[(aX − E[aX])(Z − E[Z]) + (bY − E[bY ])(Z − E[Z])] = a E[(X − E[X])(Z − E[Z])] + b E[(Y − E[Y ])(Z − E[Z])] = a Cov(X, Z) + b Cov(Y, Z),
51
When are random variables Uncorrelated?
We say that a collection of square-integrable discrete random variables X1, X2, X3, . . . is uncorrelated if Cov(Xj , Xk) = 0 for every j /= k.
52
What is Markov's Inequality?
Let X be an integrable non-negative discrete random variable. Then, P(X > x) <= E[X]/x for every x > 0. Proof. Fix x > 0. Define the random variable Y := x if X > x; 0 otherwise. We have that X > Y , because: * if X > x, then Y = x, so X > Y ; * if X ∈ [0, x), then Y = 0, so X > Y . This also gives E[X] > E[Y ]. Next, note that Y is a discrete random variable (it only attains the values 0 and x) with pY (x) = P(X > x), pY (0) = P(X < x). 74 Hence, E[X] > E[Y ] = 0 · pY (0) + x · pY (x) = x · pY (x) = x · P(X > x). Rearranging this, we obtain the desired inequality
53
What is Chebyshev's Inequality?
Let X be a square-integrable discrete random variable. Then, P(|X − E[X]| > a) <= Var(X)/a^2 for every a > 0. Proof. Let a > 0. Define Y := (X − E[X])^2. Then, Y is non-negative and E[Y ] = E[(X − E[X])^2] = Var(X). In particular, Y is integrable. Next, note that the following events are the same: {|X − E[X]| >= a} = {(X − E[X])^2 >= a^2} = {Y >= a^2}. Hence, by Markov’s inequality we have P(|X − E[X]| >= a) = P(Y >= a^2) <= E[Y ]/a^2 = Var(X)/a^2. This concludes the proof of the theorem.
54
What is the Correlation Coefficient and how is it linear?
Let X and Y be square-integrable discrete random variables with positive variance. The correlation coefficient between X and Y is defined as ρ(X, Y ) = Cov(X, Y )/σ(X) · σ(Y ). We also have linearity of the coefficient such that, ρ(aX + b, cY + d) = ρ(X, Y) Proof. We first note that the covariance between a constant and any other random variable is equal to zero. Indeed, Cov(b, Y ) = E[bY ] − E[b] · E[Y ] = b · E[Y ] − b · E[Y ] = 0. Therefore, Cov(aX + b, cY + d) = ac · Cov(X, Y ). Substituting this into the formula for the correlation coefficient, ρ(aX + b, cY + d) = Cov(aX + b, cY + d)/σ(aX + b) · σ(cY + d) = ac · Cov(X, Y )/ a · σ(X) · c · σ(Y ) = Cov(X, Y )/ σ(X) · σ(Y ) = ρ(X, Y ).
55
What is the Central Limit Theorem?
Let X1, X2, X3, . . . be mutually independent square-integrable discrete random variables with the same distribution. Denote their mean by µ and variance by σ^2 > 0. Then, for every a < b P(a <= (X1 + · · · + Xn − n · µ)/σ · √n <= b) ≈ ∫a,b [1/√2π * e^−x^2/2 dx.
56
What is a Continuous Random Variable?
We say that a random variable X is continuous with probability density function fX : R → R if P(a <= X <= b) = ∫a,b fX(x) dx for every a < b ∈ R.
57
What is the Uniform Continuous Distribution?
Let a, b ∈ R with a < b. A random variable X has the (continuous) uniform distribution on (a, b) if it has fX(x) = 1/b−a if a < x < b, 0 otherwise. We write X ∼ U(a, b).
58
What is the Normal Distribution?
Let µ ∈ R and σ > 0. A random variable X has normal (or Gaussian) distribution with parameters µ and σ^2 if it has probability density function given by fX(x) = 1/√(2πσ^2) · e^−[(x−µ)^2/2σ^2] for every x ∈ R. We write X ∼ N (µ, σ^2).
59
What is the Exponential Distribution?
Let λ > 0. A random variable X has the exponential distribution with parameter λ if fX(x) = λe^−λx if x > 0, 0 if x <= 0. We write X ∼ Exp(λ).
60
What is the Expectation of a Continuous Random Variable?
Let X be a continuous random variable with density fX. We define the expectation of X, denoted E[X], as the real number given by E[X] = ∫-∞,+∞ x fX(x) dx *Variance works similarly*
61
What is the Cumulative Distribution Function for a continuous random variable?
Let X be a random variable. The cumulative distribution function of X is the function FX : R → [0, ∞) defined by FX(x) = P(X <= x) for every x ∈ R.
62
What is Joint Density?
Two random variables X and Y defined in the same probability space are called jointly continuous with joint probability density function fX,Y : R^2 → R+ if P(a1 <= X <= a2, b1 <= Y <= b2) = ∫a1,a2 ∫b1,b2 [fX,Y (x, y)] dydx for every a1 < a2 and b1 < b2.
63
What is the Joint Cumulative Distribution Function?
Let X, Y be random variables. The joint cumulative distribution function of (X, Y ) is the function FX,Y : R^2 → [0, 1] given by FX,Y (x, y) = P(X <= x, Y <= y) for every x, y ∈ R.
64
What is the Moment Generating Function?
Given a random variable X, we define the moment generating function of X as the function MX given by MX(t) = E[e^tX] for the values of t for which e^tX is integrable.
65
What is a Moment?
We define the k-th moment of a random variable X as E[X^k] if X^k is integrable.
66
What is the Expectation for the Distributions?
Poission - λ Binomial - np Bernoulli - p Geometric - 1/p
67
What is the Variance for the distributions?
Poission - λ Binomial - np*(1-p) Bernoulli - p(1-p) Geometric - p-1/p^2
68
What is Theorem 17.1?
(Moment generating function determines the distribution). Given two random variables X and Y , if there exists a > 0 such that MX(t) and MY (t) are finite and coincide for every x ∈ [−a, a], then X and Y have the same distribution.
69
What is the Fundamental Counting Principle?
Suppose that the elements of a finite set E can be determined in k successive steps, with n1 possible choices for step 1, n2 possible choices for step 2, . . . , nk possible choices for step k. Suppose also that different choices lead to different elements. Then |E| = n1 · n2 · · · · · nk .
70
What is the Addition Rule?
If A1, . . . , An are pairwise disjoint subsets of some set then |Ui=1,n Ai| = Σi=1,n |Ai| .
71
When is the probability of an event less than the probability of another?
Let (Ω, F, P) be a probability space. If A, B ∈ F and A ⊆ B, then P(A) <= P(B). Proof. Since A ⊆ B, it follows that P(B − A) = P(B) − P(A) or, equivalently, P(A) = P(B) − P(B − A) <= P(B), where the inequality follows from the fact that probabilities are always non-negative.
72
How do we find Expectation and Variance using moments?
We have that E[X^k] = M^(k)X (0), that is the kth moment is equal to the kth derivative of the moment generating function, evaluated at 0. We can the find E[X] and E[X^2] to get expectation and variance.