Chapter 3: Discrete Random Variables and Probability Distributions Flashcards
Random Variables
- For a given sample space S of some experiment, a random variable (rv) is any rule that associates a number with each outcome in S
- In mathematical language, a random variable is a function whose domain is the sample space and whose range is the set of real numbers.
- variable because different numerical values are possible
- random because the observed value depends on which of the possible experimental outcomes results

How are Random Variables denoted?
- customarily denoted by uppercase letters, such as X and Y, near the end of the alphabet
- The notation X (s) = x means that x is the value associated with the outcome s by the random variable X
What are the 2 types of Random Variables?
- Discrete
- Continuous
Discrete Random Variables
- A random variable that can only assume distinct values is said to be discrete. Usually these represent a count.
What is a Bernoulli Experiment?
- A discrete random variable
- A Bernoulli experiment provides a 0/1 response
What is a Binomial Random Variable?
- A discrete random variable
- A binomial rv gives the number of successes in n independent, identical trials. Possible values are 0, 1, …, n
What is a Geometric variable?
- A discrete random variable
- Number of objects tested until a success. Possible values are 1, 2, 3, …
Continous Random Variables
- a random variable that can (theoretically) assume any value in a finite or infinite interval is said to be continous
Discrete RV for Toin Coss Experiment
- When we toss a coin four times, we can record the outcome as a string of heads and tails, such as HTTH.
- However we are most often interested in numerical outcomes such as the count of heads in the four tosses.
- It is convenient to use the following shorthand notation
- Let X be the number of heads.
- We call X a random variable because its values vary when the coin tossing is repeated
- If our outcome is HTTH, then X = 2, if the next outcome is TTTH, the value of X changes to 1.
- The possible values of X are 0, 1, 2, 3, 4.
- Tossing a coin four times will give X one of these possible values.
- The probability that the random variable X will equal x is: P(X = x) or simply p(x).
- The probability that we observe exactly 2 heads in a single toss of 3-coins is denoted by P(X=2) (Note: P(X=2)=3/8)
Example 1 - Bernoulli rv (a specific kind of discrete rv)
•When a student calls a university help desk for technical support, he/she will either immediately be able to speak to someone (S, for success) or will be placed on hold
(F, for failure).
With S = {S, F}, define a rv X by
X (S) = 1 X (F) = 0
•The rv X indicates whether (1) or not (0) the student can immediately speak to someone.
Example 6
- As another example, suppose we select married couples at random and do a blood test on each person until we find a husband and wife who both have the same Rh factor.
- With X = the number of blood tests to be performed, possible values of X are D = {2, 4, 6, 8, …}.
- Since the possible values have been listed in sequence, X is a discrete rv.
Probability Distributions for Discrete Random Variables
- Probabilities assigned to various outcomes in S in turn determine probabilities associated with the values of any particular rv X.
- The probability distribution of X says how the total probability of 1 is distributed among the various possible X values.
- Suppose, for example, that a business has just purchased four laser printers, and let X be the number among these that require service during the warranty period.
- Possible X values are then 0, 1, 2, 3, and 4. The probability distribution will tell us how the probability of 1 is subdivided among these five possible values— how much probability is associated with the X value 0, how much is apportioned to the X value 1, and so on.
- We will use the following notation for the probabilities in the distribution:
p (0) = the probability of the X value 0 = P(X = 0)
p (1) = the probability of the X value 1 = P(X = 1)
* In general, p (x) will denote the probability assigned to the value x.
Example: Four Coin Tosses
Toss a balanced coin four times; the discrete random variable X counts the number of heads. How do we find the probability distribution function of X?
- The outcome of four tosses is a sequence of heads and tails such as HTTH and there are 16 possible outcomes.
P(X=0) = 1/16 = .0625
P(X=1) = 4/16 = 0.25
P(X=2) = 6/16 = .375
P(X=3) = 4/16 = .25
P(X=4) = 1/16 = .0625
- The sum of these probabilities =1, so this is a legitimate probability distribution function.
- In the table form, the probability distribution for the rv X is:
Number of heads X: 0 1 2 3 4
Probability (X=x):.0625 0.25 0.375 0.25 0.0625

Probability Mass Function (pmf)
- a discrete distribution is described by giving its probability mass function, or pmf, either as a table or as a function.
- Note: In first bullet, change “all x () W” to “all s () S”

Example 7

- The Cal Poly Department of Statistics has a lab with six computers reserved for statistics majors.
- Let X denote the number of these computers that are in use at a particular time of day.
- Suppose that the probability distribution of X is as given in the following table; the first row of the table lists the possible X values and the second row gives the probability of each such value.
•We can now use elementary probability properties to calculate other probabilities of interest. For example, the probability that at most 2 computers are in use is
P(X <= 2) = P(X = 0 or 1 or 2)
= p(0) + p(1) + p(2)
= .05 + .10 + .15
= .30
•Since the event at least 3 computers are in use is complementary to at most 2 computers are in use,
P(X >= 3) = 1 – P(X <= 2)
= 1 – .30
= .70
which can, of course, also be obtained by adding together probabilities for the values, 3, 4, 5, and 6.
•The probability that between 2 and 5 computers inclusive are in use is
P(2 <= X <= 5) = P(X = 2, 3, 4, or 5)
= .15 + .25 + .20 + .15
= .75
whereas the probability that the number of computers in use is strictly between 2 and 5 is
P(2 < X < 5) = P(X = 3 or 4)
= .25 + .20
= .45
Special Discrete Random Variables: Bernoulli random variables
- Consider the experiment of recording the school year of the 1st student that walks into this class.
- Define a rv Y by
- Y = 1 if the 1st student to walk in is a 1st-year student
- Y = 0 if the 1st student is not a 1st-year student (ie all other types of students)
Such random variables arise frequently enough that they have been given a name-Bernoulli ( after the person who first studied this rv)
•Any random variable whose only possible values are 0 and 1 is called a Bernoulli random variable
1st-year Student Example: Bernoulli random variable
- Let’s say we conduct this experiment for a month, where: Y=1 if the 1st student to walk in is a 1st-year student; Y=0 if the 1st student is not a 1st-year student
- Let’s say we find that 60% of the time, the 1st student walking in was a 1st-year student. Then we have:
- p(1)=P(Y=1)=0.60
- p(0)=P(Y=0)=0.40
- p(Y=y)=0 for all y≠ 0 or y≠ 1
- A typical and equivalent representation for the pmf of a Bernoulli rv is:
- p(y) =0.4 if y=0
- =0.6 if y=1
- =0 otherwise
Example: A Parameter of a Probability Distribution
- The Bernoulli rv X = 1 when a purchaser at a store selected a desktop computer and X= 0 otherwise.
- At one store, p(0) = .8 and p(1) = .2 because 20% of all purchasers selected a desktop computer. At another store, it may be the case that p(0) = .9 and p(1) = .1.
- More generally, the pmf of any Bernoulli rv can be expressed in the form p(1) = a and p(0) = 1 – a, where 0 < a < 1.
- The pmf depends on the particular value of a (the parameter of the Bernoulli distribution), hence we often use the notation p(x;a) rather than just p(x) for the Bernoulli pmf:

A Parameter of a Probability Distribution
-
Definition of a parameter of a probability distribution:
- Suppose p(x) depends on a quantity that can be assigned any one of a number of possible values, with each different value determining a different probability distribution. Such a quantity is called a parameter of the distribution.
- The collection of all probability distributions for different values of the parameter is called a family of probability distributions.
- The quantity a(alpha) in Expression (3.1) is a parameter.
- Each different number a(alpha) between 0 and 1 determines a different member of the Bernoulli family of distributions
Example 12 - Geometric Distribution
- Starting at a fixed time, we observe the gender of each newborn child at a certain hospital until a boy (B) is born.
- Let p = P(B), assume that successive births are independent, and define the rv X by x = number of births observed until a boy is born.
Then,
p(1) = P(X = 1)
= P(B)
= p
p(2) = P(X = 2)
= P(GB)
= P(G) • P(B)
= (1 – p)p
p(3) = P(X = 3)
= P(GGB)
= P(G) • P(G) • P(B)
= (1 – p)2p
•Continuing in this way, a general formula emerges:
(3.2)
•The parameter p can assume any value between 0 and 1. Expression (3.2) describes the family of geometric distributions.

Parameters of a distribution

The Cumulative Distribution Function (cdf)
- For some fixed value x, we often wish to compute the probability that the observed value of the rv X will be at most x.
- The probability that X is at most 1 is then
P(X <= 1) = p(0) + p(1) = .500 + .167 = .667
•The largest possible X value is 2, so
P(X <= 2) = 1 ; P(X <= 3.7) = 1; P(X <= 20.5) = 1 and so on.
•Notice that P(X < 1) < P(X <= 1) since the latter includes the probability of the X value 1, whereas the former does not. More generally, when X is discrete and x is a possible value of the variable, P(X < x) < P(X <= x).

Cumulative Distribution Function

Cdf (continued)

Plot of cdf

Cdf Example
Four Coin Tosses
•Recall previous example of 4 coin tosses:
In the table form, the distribution is
Number of heads X: 0 1 2 3 4
P(X=x): 0.0625 0 .25 0 .375 0.25 0 .0625
•The probability of tossing at most two heads is:
F(2)=P(X≤2)=P(X=0 or 1 or 2)=p(0)+p(1)+p(2)
=0.0625+0.25+0.375=0.6875
•Try: Plot the graph of F(x) for different values of x. How would you best describe this plot?
- So the cdf F(x) can be obtained from the probability distribution p(x)
- Reverse is also true ie pmf can also be obtained from the cdf
- In previous example: p(2)=P(X=2)
=[p(0)+p(1)+p(2)]-[p(0)+p(1)]
=P(X≤2)-P(X≤1)
=F(2)-F(1)
Cdf Example 13 - Constructing a cdf

- A store carries flash drives with either 1 GB, 2 GB, 4 GB, 8 GB, or 16 GB of memory.
- The accompanying table gives the distribution of Y = the amount of memory in a purchased drive:
•Let’s first determine F (y) for each of the five possible values of Y:
F (1) = P (Y <= 1)
= P (Y = 1)
= p (1)
= .05
F (2) = P (Y <= 2)
= P (Y = 1 or 2)
= p (1) + p (2)
= .15
F(4) = P(Y <= 4)
= P(Y = 1 or 2 or 4)
= p(1) + p(2) + p(4)
= .50
F(8) = P(Y <= 8)
= p(1) + p(2) + p(4) + p(8)
= .90
F(16) = P(Y <= 16)
= 1
========================================================
•Now for any other number y, F (y) will equal the value of F at the closest possible value of Y to the left of y. For example,
F(2.7) = P(Y <= 2.7)
= P(Y <= 2)
= F(2)
= .15
F(7.999) = P(Y <= 7.999)
= P(Y <= 4)
= F(4)
= .50
•If y is less than 1, F (y) = 0 [e.g. F(.58) = 0], and if y is at least 16, F (y) = 1[e.g. F(25) = 1]. The cdf is thus

Cdf Example 13 - Graph
•For X a discrete rv, the graph of F (x) will have a jump at every possible value of X and will be flat between possible values. Such a graph is called a step function

The Mean of a Random Variable
- The mean X-bar of a set of observation is their ordinary average.
- The mean X-bar of a random variable X is also the average of the possible values of X, but in this case not all outcomes need to be equally likely.
Expected Value of a Discrete Random Variable
Value of X x1 x2 x3 … xk
Probability P(X=xi) p1 p2 p3 … pk
•To find the expected value E(X) or mean value for the rv X, multiply each possible value by its probability, then add all the products:

EV Example: Hard-Drive Example
The following table gives the distribution of customer choices of hard-drive size for a laptop computer model. Find the mean of this probability distribution.
Hard drive X 10 20 30 40
Probability (X=x).50 .25 .15 .10

The Expected Value of a Function of a discrete rv X
Sometimes interest will focus on the expected value of some function h (X) rather than on just E (X).
Proposition
If the rv X has a set of possible values D and pmf p (x), then the expected value of any function h (X), denoted by E [h (X)] or mh(X), is computed by:
That is, E [h (X)] is computed in the same way that E (X) itself is, except that h (x) is substituted in place of x.

Example 23: EV of function
- A computer store has purchased three computers of a certain type at $500 apiece. It will sell them for $1000 a piece.
- The manufacturer has agreed to repurchase any computers still unsold after a specified period at $200 apiece.
Let X denote the number of computers sold, and suppose that
p(0) = .1,
p(1) = .2,
p(2) = .3
p(3) = .4
•With h (X) denoting the profit associated with selling X units, the given information implies that
h (X) = revenue – cost
= 1000X + 200(3 – X) – 1500
= 800X – 900
The expected profit is then
E [h(X)] = h(0) • p(0) + h(1) • p(1) + h(2) • p(2) + h(3) • p(3)
= (–900)(.1) + (– 100)(.2) + (700)(.3) + (1500)(.4)
= $700
Rules of Expected Value
The h (X) function of interest is quite frequently a linear function aX + b. In this case, E[h(X)] is easily computed from E(X).
Proposition
E(aX + b) = a • E(X) + b
(Or, using alternative notation, maX+b = a • mx + b)
To paraphrase, the expected value of a linear function equals the linear function evaluated at the expected value
E(X). Since h(X) in Example 23 is linear and
E(X) = 2, E[h(x)] = 800(2) – 900 = $700, as before.
The Variance of X
Definition
Let X have pmf p(x) and expected value mew. Then the variance of X, denoted by V(X) or σ 2x or just σ2, is

Standard Deviation (SD) of X

Variance of X
- The quantity h(X) = (X – mew)2 is the squared deviation of X from its mean, and σ2 is the expected squared deviation—i.e., the weighted average of squared deviations, where the weights are probabilities from the distribution.
- If most of the probability distribution is close to mew, then σ2 will be relatively small.
- However, if there are x values far from mew that have large p(x), then σ2 will be quite large.
- Very roughly σ can be interpreted as the size of a representative deviation from the mean value mew.
- So if σ = 10, then in a long sequence of observed X values, some will deviate from m by more than 10 while others will be closer to the mean than that—a typical deviation from the mean will be something on the order of 10
Example 24

A library has an upper limit of 6 on the number of videos that can be checked out to an individual at one time. Consider only those who check out videos, and let X denote the number of videos checked out to a randomly selected individual. The pmf of X is as follows:
The expected value of X is easily seen to be m = 2.85.
The variance of X is then
= (1 – 2.85)2(.30) + (2 – 2.85)2(.25) + … +
(6 – 2.85)2(.15) = 3.2275
The standard deviation of X is σ= sqrt(3.2275) = 1.800.

A Shortcut Formula for σ2
The number of arithmetic operations necessary to compute σ2 can be reduced by using an alternative formula.
Proposition
V(X) = σ2 = [Σx2 • p(x)] – mew2 = E(X2) – [E(X)]2
In using this formula, E(X2) is computed first without any subtraction; then E(X) is computed, squared, and subtracted (once) from E(X2).
Rules of Variance
Rules of Variance (cntd.)
Rules of Variance (ctnd. 2)
According to the first relation in (3.14), the sd in the new unit is the original sd multiplied by the conversion factor.
The second relation says that adding or subtracting a constant does not impact variability; it just rigidly shifts the distribution to the right or left.
Example 26 - Rules of Variance
In the computer sales scenario of Example 23, E(X) = 2
and
E(X2) = (0)2(.1) + (1)2(.2) + (2)2(.3) + (3)2(.4) = 5
so, V(X) = 5 – (2)2 = 1.
The profit function h(X)= 800X – 900 then has
variance, V(h(X)) =(800)2 • V(X) = (640,000)(1) = 640,000.
Standard deviation of h(X) is the sqrt[V(h(X))]= 800.