discrete random variables Flashcards
random variables
This topic is largely about introducing some useful terminology, building on the notions of sample space and probability function. The key words are
1. Random variable
2. Probability mass function (pmf)
3. Cumulative distribution function (cdf)
discrete sample space Ξ©
a finite or listable set of outcomes {π1, π2 β¦}. The probability of an outcome π is denoted π(π).
event πΈ
a subset of Ξ©. The probability of an event πΈ is π(πΈ) = β π(π) where the sum is over πβπΈ
game with two dice
Roll a die twice and record the outcomes as (π, π), where π is the result of the first roll and
π the result of the second.
We can take the sample space to be
Ξ© = {(1,1),(1,2),(1,3),β¦,(6,6)} = {(π,π)|π,π = 1,β¦6}.
The probability function is π (π, π) = 1/36.
In this game, you win $500 if the sum is 7 and lose $100 otherwise. We give this payoff
function the name π and describe it formally by
We can change the game by using a different payoff function. For example
In this example if you roll (6, 2) then you win $2. If you roll (2, 3) then you win -$4 (i.e., lose $4).
Which game is the better bet?
discrete random variable
Let Ξ© be a sample space. A discrete random variable is a function
πβΆΞ©βR
that takes a discrete set of values. (Recall that R stands for the real numbers.)
Why is π called a random variable?
Itβs βrandomβ because its value depends on a random outcome of an experiment. And we treat π like we would a usual variable: we can add it to other random variables, square it, and so on.
Events and random variables
For any value π we write π = π to mean the event consisting of all outcomes π with π(π) = π.
Example 3. In Example 1 we rolled two dice and π was the random variable
The event π = 500 is the set {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}, i.e. the set of all outcomes that sum to 7.
So π (π = 500) = 1/6.
We allow π to be any value, even values that π never takes.
In Example 1, we could look at the event π = 1000. Since π never equals 1000 this is just the empty event (or empty set)
β΅π=1000β² = {} = β
π(π=1000)=0.
probability mass function (pmf)
The probability mass function (pmf) of a discrete random variable is the function π(π) = π (π = π).
Note:
1. We always have 0β€π(π)β€1.
2. We allow π to be any number. If π is a value that π never takes, then π(π) = 0.
Example 4: Let Ξ© be our earlier sample space for rolling 2 dice. Define the random variable π to be the maximum value of the two dice, i.e.
π (π, π) = max(π, π).
For example, the roll (3,5) has maximum 5, i.e. π (3, 5) = 5.
We can describe a random variable by listing its possible values and the probabilities associated to these values. For the above example we have:
Events and inequalities
Inequalities with random variables describe events. For example π β€ π is the set of all outcomes π such that π(π€) β€ π.
Example 5. If our sample space is the set of all pairs of (π, π) coming from rolling two dice and π(π,π) = π+π is the sum of the dice then what is π β€ 4 ?
The cumulative distribution function (cdf)
The cumulative distribution function (cdf) of a random variable π is the function πΉ given by πΉ (π) = π (π β€ π). We will often shorten this to distribution function.
Note well that the definition of πΉ(π) uses the symbol less than or equal to. This will be important for getting your calculations exactly right.
Example 6: Continuing with the example π, we have
πΉ(π) is called the cumulative distribution function because πΉ(π) gives the total probability that accumulates by adding up the probabilities π(π) as π runs from ββ to π. For example, in the table above, the entry 16/36 in column 4 for the cdf is the sum of the values of the pmf from column 1 to column 4.
True or false: F(a) is defined for all values of A
True:
Just like the probability mass function, πΉ(π) is defined for all values π. In the above example, πΉ(8) = 1, πΉ(β2) = 0, πΉ(2.5) = 4/36, and πΉ(π) = 9/36.
how to represent cdf in notation?
As events: βπ β€ 4β = {1,2,3,4}; πΉ(4) = π(π β€ 4) = 1/36+3/36+5/36+7/36 = 16/36.
Let X be the number of heads in 3 tosses of a fair coin
Probability Mass function for number of heads in 3 tosses of a fair coin
cumulative distribution function for number of heads in 3 tosses of a fair coin
pmf and cdf for the maximum of two dice
pmf and cdf for the sum of two dice
properties of the cdf F
- πΉ is non-decreasing. That is, its graph never goes down, or symbolically if π β€ π then πΉ(π) β€ πΉ(π).
- 0β€πΉ(π)β€1
- lim πΉ(π)=1 as πββ, lim πΉ(π)=0 as πβββ
πΉ is non-decreasing. That is, its graph never goes down, or symbolically if π β€ π then πΉ(π) β€ πΉ(π).
In words, this says the cumulative probability πΉ(π) increases or remains constant as π increases, but never decreases;
0β€πΉ(π)β€1
In words, this says the accumulated probability is always between 0 and 1
lim πΉ(π)=1 as πββ
and
lim πΉ(π)=0 as πβββ
In words, this says that as π gets very large, it becomes more and more certain that π β€ π and as π gets very negative it becomes more and more certain that π > π.
bernoulli distributions
The Bernoulli distribution models one trial in an experiment that can result in either success or failure This is the most important distribution and is also the simplest.
a random variable X has a Bernoulli distribution with parameter π if:
- π takes the values 0 and 1.
- π(π=1)=πandπ(π=0)=1βπ.
π βΌ Bernoulli(π) or Ber(π)
βπ follows a Bernoulli distribution with parameter πβ
or
βπ is drawn from a Bernoulli distribution with parameter πβ.
simple model for the Bernoulli distribution?
flip a coin with probability π of heads, with π=1 on heads and π =0 on tails.
The general terminology is to say π is 1 on success and 0 on failure, with success and failure defined by the context.
How to model votes for against a proposal?
If π is the proportion of the voting population that favors the proposal, than the vote of a random individual is modeled by a Bernoulli(π).
table for Bernoulli(1/2) distribution
pmf for Bernoulli(1/2) distribution
cmf for Bernoulli(1/2) distribution
table for Bernoulli(π) distribution
pmf for Bernoulli(π) distribution
cmf for Bernoulli(π) distribution
binomial distributions
Binomial(π,π), or Bin(π,π), models the number of successes in π independent Bernoulli(π) trials.
hierarchy between bernoulli and binomial
A single Bernoulli trial is, say, one toss of a coin.
A single binomial trial consists of π Bernoulli trials.
For coin flips the sample space for a Bernoulli trial is {π», π}.
The sample space for a binomial trial is all sequences of heads and tails of length π.
Likewise a Bernoulli random variable takes values 0 and 1 and a binomial random variables takes values 0, 1, 2, β¦, π.
true or false: Binomial(1,π) is the same as Bernoulli(π).
true
true or false: The number of heads in π flips of a coin with probability π of heads follows
a Binomial(π, π) distribution.
true
We describe π βΌ Binomial(π, π) by giving its values and probabilities. For notation we will
use π to mean an arbitrary number between 0 and π.
binomial coefficient
table for the pmf of a Binomial(π, π) ran- dom variable
What is the probability of 3 or more heads in 5 tosses of a fair coin?
For concreteness, let π = 5 and π = 2 (the argument for arbitrary π and π is identical.) So π βΌ binomial(5, π) and we want to compute π(2). What is the long way to compute π(2)?
List all the ways to get exactly 2 heads in 5 coin flips and add up their probabilities. The list has 10 entries:
HHTTT, HTHTT, HTTHT, HTTTH, THHTT, THTHT, THTTH, TTHHT, TTHTH, TTTHH
Each entry has the same probability of occurring, namely
π^2*(1 β π)^3.
This is because each of the two heads has probability π and each of the 3 tails has probability 1 β π.
Because the individual tosses are independent we can multiply probabilities. Therefore, the total probability of exactly 2 heads is the sum of 10 identical probabilities, i.e. π(2) = 10π^2*(1 β π)^3
For concreteness, let π = 5 and π = 2 (the argument for arbitrary π and π is identical.) So π βΌ binomial(5, π) and we want to compute π(2). What is the short way to compute π(2)?
pmf for Binomial(10, 0.5)
pmf for Binomial(10, 0.1)
pmf for Binomial(20, 0.1)
geometric distributions
A geometric distribution models the number of tails before the first head in a sequence of coin flips (Bernoulli trials).
pmf for geometric (1/3) distribution
Example 9.
(a) Flip a coin repeatedly. Let π be the number of tails before the first heads.
(b) Give a flip of tails the value 0, and heads the value 1.
(c) Give a flip of tails the value 1, and heads the value 0.
(a) So, π can equal 0, i.e. the first flip is heads, 1, 2, β¦. In principle it takes any nonnegative integer value.
(b)In this case, π is the number of 0βs before the first 1.
(c) In this case, π is the number of 1βs before the first 0.
Example 9:
(d) Call a flip of tails a success and heads a failure.
(e) Call a flip of tails a failure and heads a success.
(d) So, π is the number of successes before the first failure.
(e) So, π is the number of failures before the first success.
formal definition of what it means for random variable π to follow a geometric distribution with parameter π
true or false: The geometric distribution is an example of a discrete distribution that takes an infinite number of possible values.
true
Things can get confusing when we work with successes and failure since we might want to model the number of successes before the first failure or we might want the number of failures before the first success. To keep straight things straight you can translate to the neutral language of the number of tails before the first heads.
Example 10. Computing geometric probabilities.
Suppose that the inhabitants of an island plan their families by having babies until the first girl is born.
Assume the probability of having a girl with each pregnancy is 0.5 independent of other pregnancies, that all babies survive and there are no multiple births. What is the probability that a family has π boys?
cdf for the geometric (1/3) distribution
uniform distribution
The uniform distribution models any situation where all the outcomes are equally likely.
π βΌ uniform(π).
π takes values 1, 2, 3, β¦ , π , each with probability 1/π .
We have already seen this distribution many times when modeling to fair coins (π = 2), dice (π = 6), birthdays (π = 365), and poker hands (π = 52 C 5)
true or false: there are a million distributions and you should memorize them all
false
but you should be comfortable using a resource like Wikipedia to look up a pmf.
For example, take a look at the info box at the top right of https://en.wikipedia.org/wiki/Hypergeometric_distribution.
The info box lists many (surely unfamiliar) properties in addition to the pmf.
Arithmetic and random variables
We can do arithmetic with random variables. For example, we can add subtract, multiply or square them.
There is a simple, but extremely important idea for counting. It says that if we have a sequence of numbers that are either 0 or 1 then the sum of the sequence is the number of 1s.
Consider the sequence with five 1s
1,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0.
It is easy to see that the sum of this sequence is 5 the number of 1s.
We illustrate this idea by counting the number of heads in π tosses of a coin.
Toss a fair coin π times. Let ππ be 1 if the πth toss is heads and 0 if itβs tails.
The important thing to see in the example above is that weβve written the more complicated binomial random variable π as the sum of extremely simple random variables ππ. This will allow us to manipulate π algebraically.
Think: Suppose π and π are independent and π βΌ binomial(π, 1/2) and π βΌ binomial(π, 1/2). What kind of distribution does π + π follow?
binomial(π + π, 1/2)
The first thing to do is make a two-dimensional table for the product sample space consisting of pairs (π₯, π¦), where π₯ is a possible value of π and π¦ one of π . To help do the computation, the probabilities for the π values are put in the far right column and those for π are in the bottom row. Because π and π are independent the probability for (π₯, π¦) pair is just the product of the individual probabilities.
what do the diagonal stripes here show?
The diagonal stripes show sets of squares where π + π is the same. All we have to do to
compute the probability table for π + π is sum the probabilities for each stripe.