discrete random variables Flashcards

1
Q

random variables

A

This topic is largely about introducing some useful terminology, building on the notions of sample space and probability function. The key words are
1. Random variable
2. Probability mass function (pmf)
3. Cumulative distribution function (cdf)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

discrete sample space Ξ©

A

a finite or listable set of outcomes {πœ”1, πœ”2 …}. The probability of an outcome πœ” is denoted 𝑃(πœ”).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

event 𝐸

A

a subset of Ξ©. The probability of an event 𝐸 is 𝑃(𝐸) = βˆ‘ 𝑃(πœ”) where the sum is over πœ”βˆˆπΈ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

game with two dice

A

Roll a die twice and record the outcomes as (𝑖, 𝑗), where 𝑖 is the result of the first roll and
𝑗 the result of the second.

We can take the sample space to be

Ξ© = {(1,1),(1,2),(1,3),…,(6,6)} = {(𝑖,𝑗)|𝑖,𝑗 = 1,…6}.

The probability function is 𝑃 (𝑖, 𝑗) = 1/36.

In this game, you win $500 if the sum is 7 and lose $100 otherwise. We give this payoff
function the name 𝑋 and describe it formally by

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

We can change the game by using a different payoff function. For example

A

In this example if you roll (6, 2) then you win $2. If you roll (2, 3) then you win -$4 (i.e., lose $4).

Which game is the better bet?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

discrete random variable

A

Let Ξ© be a sample space. A discrete random variable is a function

π‘‹βˆΆΞ©β†’R

that takes a discrete set of values. (Recall that R stands for the real numbers.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is 𝑋 called a random variable?

A

It’s β€˜random’ because its value depends on a random outcome of an experiment. And we treat 𝑋 like we would a usual variable: we can add it to other random variables, square it, and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Events and random variables

A

For any value π‘Ž we write 𝑋 = π‘Ž to mean the event consisting of all outcomes πœ” with 𝑋(πœ”) = π‘Ž.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Example 3. In Example 1 we rolled two dice and 𝑋 was the random variable

A

The event 𝑋 = 500 is the set {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}, i.e. the set of all outcomes that sum to 7.

So 𝑃 (𝑋 = 500) = 1/6.

We allow π‘Ž to be any value, even values that 𝑋 never takes.

In Example 1, we could look at the event 𝑋 = 1000. Since 𝑋 never equals 1000 this is just the empty event (or empty set)

‡𝑋=1000β€² = {} = βˆ…

𝑃(𝑋=1000)=0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

probability mass function (pmf)

A

The probability mass function (pmf) of a discrete random variable is the function 𝑝(π‘Ž) = 𝑃 (𝑋 = π‘Ž).

Note:
1. We always have 0≀𝑝(π‘Ž)≀1.
2. We allow π‘Ž to be any number. If π‘Ž is a value that 𝑋 never takes, then 𝑝(π‘Ž) = 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Example 4: Let Ξ© be our earlier sample space for rolling 2 dice. Define the random variable 𝑀 to be the maximum value of the two dice, i.e.
𝑀 (𝑖, 𝑗) = max(𝑖, 𝑗).
For example, the roll (3,5) has maximum 5, i.e. 𝑀 (3, 5) = 5.

A

We can describe a random variable by listing its possible values and the probabilities associated to these values. For the above example we have:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Events and inequalities

A

Inequalities with random variables describe events. For example 𝑋 ≀ π‘Ž is the set of all outcomes πœ” such that 𝑋(𝑀) ≀ π‘Ž.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Example 5. If our sample space is the set of all pairs of (𝑖, 𝑗) coming from rolling two dice and 𝑍(𝑖,𝑗) = 𝑖+𝑗 is the sum of the dice then what is 𝑍 ≀ 4 ?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The cumulative distribution function (cdf)

A

The cumulative distribution function (cdf) of a random variable 𝑋 is the function 𝐹 given by 𝐹 (π‘Ž) = 𝑃 (𝑋 ≀ π‘Ž). We will often shorten this to distribution function.

Note well that the definition of 𝐹(π‘Ž) uses the symbol less than or equal to. This will be important for getting your calculations exactly right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Example 6: Continuing with the example 𝑀, we have

A

𝐹(π‘Ž) is called the cumulative distribution function because 𝐹(π‘Ž) gives the total probability that accumulates by adding up the probabilities 𝑝(𝑏) as 𝑏 runs from βˆ’βˆž to π‘Ž. For example, in the table above, the entry 16/36 in column 4 for the cdf is the sum of the values of the pmf from column 1 to column 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True or false: F(a) is defined for all values of A

A

True:

Just like the probability mass function, 𝐹(π‘Ž) is defined for all values π‘Ž. In the above example, 𝐹(8) = 1, 𝐹(βˆ’2) = 0, 𝐹(2.5) = 4/36, and 𝐹(πœ‹) = 9/36.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

how to represent cdf in notation?

A

As events: β€˜π‘€ ≀ 4’ = {1,2,3,4}; 𝐹(4) = 𝑃(𝑀 ≀ 4) = 1/36+3/36+5/36+7/36 = 16/36.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Let X be the number of heads in 3 tosses of a fair coin

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Probability Mass function for number of heads in 3 tosses of a fair coin

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

cumulative distribution function for number of heads in 3 tosses of a fair coin

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

pmf and cdf for the maximum of two dice

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

pmf and cdf for the sum of two dice

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

properties of the cdf F

A
  1. 𝐹 is non-decreasing. That is, its graph never goes down, or symbolically if π‘Ž ≀ 𝑏 then 𝐹(π‘Ž) ≀ 𝐹(𝑏).
  2. 0≀𝐹(π‘Ž)≀1
  3. lim 𝐹(π‘Ž)=1 as π‘Žβ†’βˆž, lim 𝐹(π‘Ž)=0 as π‘Žβ†’βˆ’βˆž
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

𝐹 is non-decreasing. That is, its graph never goes down, or symbolically if π‘Ž ≀ 𝑏 then 𝐹(π‘Ž) ≀ 𝐹(𝑏).

A

In words, this says the cumulative probability 𝐹(π‘Ž) increases or remains constant as π‘Ž increases, but never decreases;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

0≀𝐹(π‘Ž)≀1

A

In words, this says the accumulated probability is always between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

lim 𝐹(π‘Ž)=1 as π‘Žβ†’βˆž

and

lim 𝐹(π‘Ž)=0 as π‘Žβ†’βˆ’βˆž

A

In words, this says that as π‘Ž gets very large, it becomes more and more certain that 𝑋 ≀ π‘Ž and as π‘Ž gets very negative it becomes more and more certain that 𝑋 > π‘Ž.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

bernoulli distributions

A

The Bernoulli distribution models one trial in an experiment that can result in either success or failure This is the most important distribution and is also the simplest.

28
Q

a random variable X has a Bernoulli distribution with parameter 𝑝 if:

A
  1. 𝑋 takes the values 0 and 1.
  2. 𝑃(𝑋=1)=𝑝and𝑃(𝑋=0)=1βˆ’π‘.
29
Q

𝑋 ∼ Bernoulli(𝑝) or Ber(𝑝)

A

β€œπ‘‹ follows a Bernoulli distribution with parameter 𝑝”
or
β€œπ‘‹ is drawn from a Bernoulli distribution with parameter 𝑝”.

30
Q

simple model for the Bernoulli distribution?

A

flip a coin with probability 𝑝 of heads, with 𝑋=1 on heads and 𝑋 =0 on tails.

The general terminology is to say 𝑋 is 1 on success and 0 on failure, with success and failure defined by the context.

31
Q

How to model votes for against a proposal?

A

If 𝑝 is the proportion of the voting population that favors the proposal, than the vote of a random individual is modeled by a Bernoulli(𝑝).

32
Q

table for Bernoulli(1/2) distribution

A
33
Q

pmf for Bernoulli(1/2) distribution

A
34
Q

cmf for Bernoulli(1/2) distribution

A
35
Q

table for Bernoulli(𝑝) distribution

A
36
Q

pmf for Bernoulli(𝑝) distribution

A
37
Q

cmf for Bernoulli(𝑝) distribution

A
38
Q

binomial distributions

A

Binomial(𝑛,𝑝), or Bin(𝑛,𝑝), models the number of successes in 𝑛 independent Bernoulli(𝑝) trials.

39
Q

hierarchy between bernoulli and binomial

A

A single Bernoulli trial is, say, one toss of a coin.

A single binomial trial consists of 𝑛 Bernoulli trials.

For coin flips the sample space for a Bernoulli trial is {𝐻, 𝑇}.
The sample space for a binomial trial is all sequences of heads and tails of length 𝑛.

Likewise a Bernoulli random variable takes values 0 and 1 and a binomial random variables takes values 0, 1, 2, …, 𝑛.

40
Q

true or false: Binomial(1,𝑝) is the same as Bernoulli(𝑝).

A

true

41
Q

true or false: The number of heads in 𝑛 flips of a coin with probability 𝑝 of heads follows
a Binomial(𝑛, 𝑝) distribution.

A

true

We describe 𝑋 ∼ Binomial(𝑛, 𝑝) by giving its values and probabilities. For notation we will
use π‘˜ to mean an arbitrary number between 0 and 𝑛.

42
Q

binomial coefficient

A
43
Q

table for the pmf of a Binomial(𝑛, π‘˜) ran- dom variable

A
44
Q

What is the probability of 3 or more heads in 5 tosses of a fair coin?

A
45
Q

For concreteness, let 𝑛 = 5 and π‘˜ = 2 (the argument for arbitrary 𝑛 and π‘˜ is identical.) So 𝑋 ∼ binomial(5, 𝑝) and we want to compute 𝑝(2). What is the long way to compute 𝑝(2)?

A

List all the ways to get exactly 2 heads in 5 coin flips and add up their probabilities. The list has 10 entries:

HHTTT, HTHTT, HTTHT, HTTTH, THHTT, THTHT, THTTH, TTHHT, TTHTH, TTTHH

Each entry has the same probability of occurring, namely

𝑝^2*(1 βˆ’ 𝑝)^3.

This is because each of the two heads has probability 𝑝 and each of the 3 tails has probability 1 βˆ’ 𝑝.

Because the individual tosses are independent we can multiply probabilities. Therefore, the total probability of exactly 2 heads is the sum of 10 identical probabilities, i.e. 𝑝(2) = 10𝑝^2*(1 βˆ’ 𝑝)^3

46
Q

For concreteness, let 𝑛 = 5 and π‘˜ = 2 (the argument for arbitrary 𝑛 and π‘˜ is identical.) So 𝑋 ∼ binomial(5, 𝑝) and we want to compute 𝑝(2). What is the short way to compute 𝑝(2)?

A
47
Q

pmf for Binomial(10, 0.5)

A
48
Q

pmf for Binomial(10, 0.1)

A
49
Q

pmf for Binomial(20, 0.1)

A
50
Q

geometric distributions

A

A geometric distribution models the number of tails before the first head in a sequence of coin flips (Bernoulli trials).

51
Q

pmf for geometric (1/3) distribution

A
52
Q

Example 9.

(a) Flip a coin repeatedly. Let 𝑋 be the number of tails before the first heads.

(b) Give a flip of tails the value 0, and heads the value 1.

(c) Give a flip of tails the value 1, and heads the value 0.

A

(a) So, 𝑋 can equal 0, i.e. the first flip is heads, 1, 2, …. In principle it takes any nonnegative integer value.

(b)In this case, 𝑋 is the number of 0’s before the first 1.

(c) In this case, 𝑋 is the number of 1’s before the first 0.

53
Q

Example 9:

(d) Call a flip of tails a success and heads a failure.

(e) Call a flip of tails a failure and heads a success.

A

(d) So, 𝑋 is the number of successes before the first failure.

(e) So, 𝑋 is the number of failures before the first success.

54
Q

formal definition of what it means for random variable 𝑋 to follow a geometric distribution with parameter 𝑝

A
55
Q

true or false: The geometric distribution is an example of a discrete distribution that takes an infinite number of possible values.

A

true

Things can get confusing when we work with successes and failure since we might want to model the number of successes before the first failure or we might want the number of failures before the first success. To keep straight things straight you can translate to the neutral language of the number of tails before the first heads.

56
Q

Example 10. Computing geometric probabilities.

Suppose that the inhabitants of an island plan their families by having babies until the first girl is born.

Assume the probability of having a girl with each pregnancy is 0.5 independent of other pregnancies, that all babies survive and there are no multiple births. What is the probability that a family has π‘˜ boys?

A
56
Q

cdf for the geometric (1/3) distribution

A
57
Q

uniform distribution

A

The uniform distribution models any situation where all the outcomes are equally likely.

𝑋 ∼ uniform(𝑁).

𝑋 takes values 1, 2, 3, … , 𝑁 , each with probability 1/𝑁 .
We have already seen this distribution many times when modeling to fair coins (𝑁 = 2), dice (𝑁 = 6), birthdays (𝑁 = 365), and poker hands (𝑁 = 52 C 5)

58
Q

true or false: there are a million distributions and you should memorize them all

A

false

but you should be comfortable using a resource like Wikipedia to look up a pmf.

For example, take a look at the info box at the top right of https://en.wikipedia.org/wiki/Hypergeometric_distribution.

The info box lists many (surely unfamiliar) properties in addition to the pmf.

59
Q

Arithmetic and random variables

A

We can do arithmetic with random variables. For example, we can add subtract, multiply or square them.

There is a simple, but extremely important idea for counting. It says that if we have a sequence of numbers that are either 0 or 1 then the sum of the sequence is the number of 1s.

60
Q

Consider the sequence with five 1s

1,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0.

A

It is easy to see that the sum of this sequence is 5 the number of 1s.

We illustrate this idea by counting the number of heads in 𝑛 tosses of a coin.

61
Q

Toss a fair coin 𝑛 times. Let 𝑋𝑗 be 1 if the 𝑗th toss is heads and 0 if it’s tails.

A

The important thing to see in the example above is that we’ve written the more complicated binomial random variable 𝑋 as the sum of extremely simple random variables 𝑋𝑗. This will allow us to manipulate 𝑋 algebraically.

62
Q

Think: Suppose 𝑋 and π‘Œ are independent and 𝑋 ∼ binomial(𝑛, 1/2) and π‘Œ ∼ binomial(π‘š, 1/2). What kind of distribution does 𝑋 + π‘Œ follow?

A

binomial(𝑛 + π‘š, 1/2)

63
Q
A

The first thing to do is make a two-dimensional table for the product sample space consisting of pairs (π‘₯, 𝑦), where π‘₯ is a possible value of 𝑋 and 𝑦 one of π‘Œ . To help do the computation, the probabilities for the 𝑋 values are put in the far right column and those for π‘Œ are in the bottom row. Because 𝑋 and π‘Œ are independent the probability for (π‘₯, 𝑦) pair is just the product of the individual probabilities.

64
Q

what do the diagonal stripes here show?

A

The diagonal stripes show sets of squares where 𝑋 + π‘Œ is the same. All we have to do to
compute the probability table for 𝑋 + π‘Œ is sum the probabilities for each stripe.