As another example, suppose we select married couples at random and do a blood test on each person until we find a husband and wife who both have the same Rh factor. With X = the number of blood tests to be performed, possible values of X are D = {2, 4, 6, 8, …}. Since the possible values have been listed in sequence, X is a discrete rv.

Chapter 3: Discrete Random Variables and Probability Distributions Flashcards by Brian Nam

Random Variables

For a given sample space S of some experiment, a random variable (rv) is any rule that associates a number with each outcome in S
- In mathematical language, a random variable is a function whose domain is the sample space and whose range is the set of real numbers.
variable because different numerical values are possible
random because the observed value depends on which of the possible experimental outcomes results

How well did you know this?

Not at all

Perfectly

How are Random Variables denoted?

customarily denoted by uppercase letters, such as X and Y, near the end of the alphabet
The notation X (s) = x means that x is the value associated with the outcome s by the random variable X

How well did you know this?

Not at all

Perfectly

What are the 2 types of Random Variables?

Discrete
Continuous

How well did you know this?

Not at all

Perfectly

Discrete Random Variables

A random variable that can only assume distinct values is said to be discrete. Usually these represent a count.

How well did you know this?

Not at all

Perfectly

What is a Bernoulli Experiment?

A discrete random variable
A Bernoulli experiment provides a 0/1 response

How well did you know this?

Not at all

Perfectly

What is a Binomial Random Variable?

A discrete random variable
A binomial rv gives the number of successes in n independent, identical trials. Possible values are 0, 1, …, n

How well did you know this?

Not at all

Perfectly

What is a Geometric variable?

A discrete random variable
Number of objects tested until a success. Possible values are 1, 2, 3, …

How well did you know this?

Not at all

Perfectly

Continous Random Variables

a random variable that can (theoretically) assume any value in a finite or infinite interval is said to be continous

How well did you know this?

Not at all

Perfectly

Discrete RV for Toin Coss Experiment

When we toss a coin four times, we can record the outcome as a string of heads and tails, such as HTTH.
However we are most often interested in numerical outcomes such as the count of heads in the four tosses.
It is convenient to use the following shorthand notation
- Let X be the number of heads.
- We call X a random variable because its values vary when the coin tossing is repeated
- If our outcome is HTTH, then X = 2, if the next outcome is TTTH, the value of X changes to 1.
- The possible values of X are 0, 1, 2, 3, 4.
- Tossing a coin four times will give X one of these possible values.
The probability that the random variable X will equal x is: P(X = x) or simply p(x).
- The probability that we observe exactly 2 heads in a single toss of 3-coins is denoted by P(X=2) (Note: P(X=2)=3/8)

How well did you know this?

Not at all

Perfectly

Example 1 - Bernoulli rv (a specific kind of discrete rv)

•When a student calls a university help desk for technical support, he/she will either immediately be able to speak to someone (S, for success) or will be placed on hold
(F, for failure).

With S = {S, F}, define a rv X by

X (S) = 1 X (F) = 0

•The rv X indicates whether (1) or not (0) the student can immediately speak to someone.

How well did you know this?

Not at all

Perfectly

Example 6

As another example, suppose we select married couples at random and do a blood test on each person until we find a husband and wife who both have the same Rh factor.
With X = the number of blood tests to be performed, possible values of X are D = {2, 4, 6, 8, …}.
Since the possible values have been listed in sequence, X is a discrete rv.

How well did you know this?

Not at all

Perfectly

Probability Distributions for Discrete Random Variables

Probabilities assigned to various outcomes in S in turn determine probabilities associated with the values of any particular rv X.
The probability distribution of X says how the total probability of 1 is distributed among the various possible X values.
Suppose, for example, that a business has just purchased four laser printers, and let X be the number among these that require service during the warranty period.
- Possible X values are then 0, 1, 2, 3, and 4. The probability distribution will tell us how the probability of 1 is subdivided among these five possible values— how much probability is associated with the X value 0, how much is apportioned to the X value 1, and so on.
We will use the following notation for the probabilities in the distribution:

p (0) = the probability of the X value 0 = P(X = 0)

p (1) = the probability of the X value 1 = P(X = 1)
* In general, p (x) will denote the probability assigned to the value x.

How well did you know this?

Not at all

Perfectly

Example: Four Coin Tosses

Toss a balanced coin four times; the discrete random variable X counts the number of heads. How do we find the probability distribution function of X?

The outcome of four tosses is a sequence of heads and tails such as HTTH and there are 16 possible outcomes.

P(X=0) = 1/16 = .0625

P(X=1) = 4/16 = 0.25

P(X=2) = 6/16 = .375

P(X=3) = 4/16 = .25

P(X=4) = 1/16 = .0625

The sum of these probabilities =1, so this is a legitimate probability distribution function.
In the table form, the probability distribution for the rv X is:

Number of heads X: 0 1 2 3 4

Probability (X=x):.0625 0.25 0.375 0.25 0.0625

How well did you know this?

Not at all

Perfectly

Probability Mass Function (pmf)

a discrete distribution is described by giving its probability mass function, or pmf, either as a table or as a function.
Note: In first bullet, change “all x () W” to “all s () S”

How well did you know this?

Not at all

Perfectly

Example 7

The Cal Poly Department of Statistics has a lab with six computers reserved for statistics majors.
Let X denote the number of these computers that are in use at a particular time of day.
Suppose that the probability distribution of X is as given in the following table; the first row of the table lists the possible X values and the second row gives the probability of each such value.

•We can now use elementary probability properties to calculate other probabilities of interest. For example, the probability that at most 2 computers are in use is

P(X <= 2) = P(X = 0 or 1 or 2)

= p(0) + p(1) + p(2)

= .05 + .10 + .15

= .30

•Since the event at least 3 computers are in use is complementary to at most 2 computers are in use,

P(X >= 3) = 1 – P(X <= 2)

= 1 – .30

= .70

which can, of course, also be obtained by adding together probabilities for the values, 3, 4, 5, and 6.

•The probability that between 2 and 5 computers inclusive are in use is

P(2 <= X <= 5) = P(X = 2, 3, 4, or 5)

= .15 + .25 + .20 + .15

= .75

whereas the probability that the number of computers in use is strictly between 2 and 5 is

P(2 < X < 5) = P(X = 3 or 4)

= .25 + .20

= .45

How well did you know this?

Not at all

Perfectly

Special Discrete Random Variables: Bernoulli random variables

Consider the experiment of recording the school year of the 1st student that walks into this class.
Define a rv Y by
Y = 1 if the 1st student to walk in is a 1st-year student
Y = 0 if the 1st student is not a 1st-year student (ie all other types of students)

Such random variables arise frequently enough that they have been given a name-Bernoulli ( after the person who first studied this rv)

•Any random variable whose only possible values are 0 and 1 is called a Bernoulli random variable

How well did you know this?

Not at all

Perfectly

1st-year Student Example: Bernoulli random variable

Let’s say we conduct this experiment for a month, where: Y=1 if the 1st student to walk in is a 1st-year student; Y=0 if the 1st student is not a 1st-year student
Let’s say we find that 60% of the time, the 1st student walking in was a 1st-year student. Then we have:
- p(1)=P(Y=1)=0.60
- p(0)=P(Y=0)=0.40
- p(Y=y)=0 for all y≠ 0 or y≠ 1
A typical and equivalent representation for the pmf of a Bernoulli rv is:
- p(y) =0.4 if y=0
- =0.6 if y=1
- =0 otherwise

How well did you know this?

Not at all

Perfectly

Example: A Parameter of a Probability Distribution

Study These Flashcards

The Bernoulli rv X = 1 when a purchaser at a store selected a desktop computer and X= 0 otherwise.
At one store, p(0) = .8 and p(1) = .2 because 20% of all purchasers selected a desktop computer. At another store, it may be the case that p(0) = .9 and p(1) = .1.
More generally, the pmf of any Bernoulli rv can be expressed in the form p(1) = a and p(0) = 1 – a, where 0 < a < 1.
The pmf depends on the particular value of a (the parameter of the Bernoulli distribution), hence we often use the notation p(x;a) rather than just p(x) for the Bernoulli pmf:

A Parameter of a Probability Distribution

Study These Flashcards

Definition of a parameter of a probability distribution:
- Suppose p(x) depends on a quantity that can be assigned any one of a number of possible values, with each different value determining a different probability distribution. Such a quantity is called a parameter of the distribution.
The collection of all probability distributions for different values of the parameter is called a family of probability distributions.
The quantity a(alpha) in Expression (3.1) is a parameter.
Each different number a(alpha) between 0 and 1 determines a different member of the Bernoulli family of distributions

Example 12 - Geometric Distribution

Study These Flashcards

Starting at a fixed time, we observe the gender of each newborn child at a certain hospital until a boy (B) is born.
Let p = P(B), assume that successive births are independent, and define the rv X by x = number of births observed until a boy is born.

Then,

p(1) = P(X = 1)

= P(B)

= p

p(2) = P(X = 2)

= P(GB)

= P(G) • P(B)

= (1 – p)p

p(3) = P(X = 3)

= P(GGB)

= P(G) • P(G) • P(B)

= (1 – p)²p

•Continuing in this way, a general formula emerges:

(3.2)

•The parameter p can assume any value between 0 and 1. Expression (3.2) describes the family of geometric distributions.

Parameters of a distribution

Study These Flashcards

The Cumulative Distribution Function (cdf)

Study These Flashcards

For some fixed value x, we often wish to compute the probability that the observed value of the rv X will be at most x.
The probability that X is at most 1 is then

P(X <= 1) = p(0) + p(1) = .500 + .167 = .667

•The largest possible X value is 2, so

P(X <= 2) = 1 ; P(X <= 3.7) = 1; P(X <= 20.5) = 1 and so on.

•Notice that P(X < 1) < P(X <= 1) since the latter includes the probability of the X value 1, whereas the former does not. More generally, when X is discrete and x is a possible value of the variable, P(X < x) < P(X <= x).

Cumulative Distribution Function

Study These Flashcards

Cdf (continued)

Study These Flashcards

Plot of cdf

**_Cdf Example_** Four Coin Tosses •Recall previous example of 4 coin tosses: In the table form, the distribution is **Number of heads X**: 0 1 2 3 4 **P(X=x):** 0.0625 0 .25 0 .375 0.25 0 .0625 •The probability of tossing at most two heads is: F(2)=P(X≤2)=P(X=0 or 1 or 2)=p(0)+p(1)+p(2) =0.0625+0.25+0.375=0.6875 •Try: Plot the graph of F(x) for different values of x. How would you best describe this plot?

* So the cdf F(x) can be obtained from the probability distribution p(x) * Reverse is also true ie pmf can also be obtained from the cdf * In previous example: p(2)=P(X=2) =[p(0)+p(1)+p(2)]-[p(0)+p(1)] =P(X≤2)-P(X≤1) =F(2)-F(1)

**_Cdf Example 13 - Constructing a cdf_** ## Footnote * A store carries flash drives with either 1 GB, 2 GB, 4 GB, 8 GB, or 16 GB of memory. * The accompanying table gives the distribution of Y = the amount of memory in a purchased drive:

•Let’s first determine F (y) for each of the five possible values of Y: **F (1) = P (Y \<= 1)** = P (Y = 1) = p (1) = .05 **F (2) = P (Y \<= 2)** = P (Y = 1 or 2) = p (1) + p (2) = .15 **F(4) = P(Y \<= 4)** = P(Y = 1 or 2 or 4) = p(1) + p(2) + p(4) = .50 **F(8) = P(Y \<= 8)** = p(1) + p(2) + p(4) + p(8) = .90 **F(16) = P(Y \<= 16)** = 1 ======================================================== •Now for any other number y, F (y) will equal the value of F at the closest possible value of Y to the left of y. For example, **F(2.7) = P(Y \<= 2.7)** = P(Y \<= 2) = F(2) = .15 **F(7.999) = P(Y \<= 7.999)** = P(Y \<= 4) = F(4) = .50 •If y is less than 1, F (y) = 0 [e.g. F(.58) = 0], and if y is at least 16, F (y) = 1[e.g. F(25) = 1]. The cdf is thus

Cdf Example 13 - Graph

•For X a discrete rv, the graph of F (x) will have a jump at every possible value of X and will be flat between possible values. Such a graph is called a step function

The Mean of a Random Variable

* The mean X-bar of a set of observation is their ordinary average. * The mean X-bar of a random variable X is also the average of the possible values of X, but in this case _not all outcomes need to be equally likely._

Expected Value of a Discrete Random Variable

**Value of X** x₁ x₂ x₃ … x_k **Probability P(X=xi)** p₁ p₂ p₃ … p_k •To find the expected value ***E(X)*** or **_mean value for the_** **_rv_** **_X_**, multiply each possible value by its probability, then add all the products:

**_EV Example: Hard-Drive Example_** The following table gives the distribution of customer choices of hard-drive size for a laptop computer model. Find the mean of this probability distribution. **Hard drive X** 10 20 30 40 **Probability (X=x)**.50 .25 .15 .10

The Expected Value of a Function of a discrete rv X

Sometimes interest will focus on the expected value of some function h (X) rather than on just E (X). **_Proposition_** If the rv X has a set of possible values D and pmf p (x), then the expected value of any function h (X), denoted by E [h (X)] or mh(X), is computed by: That is, E [h (X)] is computed in the same way that E (X) itself is, except that h (x) is substituted in place of x.

**_Example 23: EV of function_** ## Footnote * A computer store has purchased three computers of a certain type at $500 apiece. It will sell them for $1000 a piece. * The manufacturer has agreed to repurchase any computers still unsold after a specified period at $200 apiece. Let X denote the number of computers sold, and suppose that p(0) = .1, p(1) = .2, p(2) = .3 p(3) = .4

•With h (X) denoting the profit associated with selling X units, the given information implies that h (X) = revenue – cost = 1000X + 200(3 – X) – 1500 = 800X – 900 The expected profit is then E [h(X)] = h(0) • p(0) + h(1) • p(1) + h(2) • p(2) + h(3) • p(3) = (–900)(.1) + (– 100)(.2) + (700)(.3) + (1500)(.4) = $700

Rules of Expected Value

The h (X) function of interest is quite frequently a linear function aX + b. In this case, E[h(X)] is easily computed from E(X). **_Proposition_** E(aX + b) = a • E(X) + b (Or, using alternative notation, m_aX+b = a • m_x + b) To paraphrase, the expected value of a linear function equals the linear function evaluated at the expected value E(X). Since h(X) in Example 23 is linear and E(X) = 2, E[h(x)] = 800(2) – 900 = $700, as before.

The Variance of X

**_Definition_** Let X have pmf p(x) and expected value mew. Then the **variance** of X, denoted by V(X) or σ ²_x or just σ², is

**Standard Deviation** (SD) of X

Variance of X

* The quantity h(X) = (X – mew)² is the squared deviation of X from its mean, and σ²is the expected squared deviation—i.e., the weighted average of squared deviations, where the weights are probabilities from the distribution. * If most of the probability distribution is close to *mew*, then σ²will be relatively small. * However, if there are x values far from *mew* that have large p(x), then σ²will be quite large. * Very roughly σ can be interpreted as the size of a representative deviation from the mean value *mew*. * So if σ = 10, then in a long sequence of observed X values, some will deviate from m by more than 10 while others will be closer to the mean than that—a typical deviation from the mean will be something on the order of 10

**_Example 24_** ## Footnote A library has an upper limit of 6 on the number of videos that can be checked out to an individual at one time. Consider only those who check out videos, and let X denote the number of videos checked out to a randomly selected individual. The pmf of X is as follows: The expected value of X is easily seen to be m = 2.85.

The variance of X is then = (1 – 2.85)2(.30) + (2 – 2.85)2(.25) + ... + (6 – 2.85)2(.15) = 3.2275 The standard deviation of X is σ= sqrt(3.2275) = 1.800.

A Shortcut Formula for σ²

The number of arithmetic operations necessary to compute σ²can be reduced by using an alternative formula. **_Proposition_** V(X) = σ² = [Σx²• p(x)] – *mew*² = E(X²) – [E(X)]² In using this formula, E(X²) is computed first without any subtraction; then E(X) is computed, squared, and subtracted (once) from E(X²).

Rules of Variance

Rules of Variance (cntd.)

Rules of Variance (ctnd. 2)

According to the first relation in (3.14), the sd in the new unit is the original sd multiplied by the conversion factor. The second relation says that adding or subtracting a constant does not impact variability; it just rigidly shifts the distribution to the right or left.

Example 26 - Rules of Variance

In the computer sales scenario of Example 23, E(X) = 2 and E(X²) = (0)²(.1) + (1)²(.2) + (2)²(.3) + (3)²(.4) = 5 so, V(X) = 5 – (2)² = 1. The profit function h(X)= 800X – 900 then has variance, V(h(X)) =(800)² • V(X) = (640,000)(1) = 640,000. Standard deviation of h(X) is the sqrt[V(h(X))]= 800.

Chapter 3: Discrete Random Variables and Probability Distributions Flashcards

(44 cards)