Chapter 3: Discrete Random Variables and Probability Distributions Flashcards

1
Q

Random Variables

A
  • For a given sample space S of some experiment, a random variable (rv) is any rule that associates a number with each outcome in S
    • In mathematical language, a random variable is a function whose domain is the sample space and whose range is the set of real numbers.
  • variable because different numerical values are possible
  • random because the observed value depends on which of the possible experimental outcomes results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How are Random Variables denoted?

A
  • customarily denoted by uppercase letters, such as X and Y, near the end of the alphabet
  • The notation X (s) = x means that x is the value associated with the outcome s by the random variable X
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 types of Random Variables?

A
  1. Discrete
  2. Continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Discrete Random Variables

A
  • A random variable that can only assume distinct values is said to be discrete. Usually these represent a count.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Bernoulli Experiment?

A
  • A discrete random variable
  • A Bernoulli experiment provides a 0/1 response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Binomial Random Variable?

A
  • A discrete random variable
  • A binomial rv gives the number of successes in n independent, identical trials. Possible values are 0, 1, …, n
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Geometric variable?

A
  • A discrete random variable
  • Number of objects tested until a success. Possible values are 1, 2, 3, …
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Continous Random Variables

A
  • a random variable that can (theoretically) assume any value in a finite or infinite interval is said to be continous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discrete RV for Toin Coss Experiment

A
  • When we toss a coin four times, we can record the outcome as a string of heads and tails, such as HTTH.
  • However we are most often interested in numerical outcomes such as the count of heads in the four tosses.
  • It is convenient to use the following shorthand notation
    • Let X be the number of heads.
    • We call X a random variable because its values vary when the coin tossing is repeated
    • If our outcome is HTTH, then X = 2, if the next outcome is TTTH, the value of X changes to 1.
    • The possible values of X are 0, 1, 2, 3, 4.
    • Tossing a coin four times will give X one of these possible values.
  • The probability that the random variable X will equal x is: P(X = x) or simply p(x).
    • The probability that we observe exactly 2 heads in a single toss of 3-coins is denoted by P(X=2) (Note: P(X=2)=3/8)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Example 1 - Bernoulli rv (a specific kind of discrete rv)

A

•When a student calls a university help desk for technical support, he/she will either immediately be able to speak to someone (S, for success) or will be placed on hold
(F, for failure).

With S = {S, F}, define a rv X by

X (S) = 1 X (F) = 0

•The rv X indicates whether (1) or not (0) the student can immediately speak to someone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Example 6

A
  • As another example, suppose we select married couples at random and do a blood test on each person until we find a husband and wife who both have the same Rh factor.
  • With X = the number of blood tests to be performed, possible values of X are D = {2, 4, 6, 8, …}.
  • Since the possible values have been listed in sequence, X is a discrete rv.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Probability Distributions for Discrete Random Variables

A
  • Probabilities assigned to various outcomes in S in turn determine probabilities associated with the values of any particular rv X.
  • The probability distribution of X says how the total probability of 1 is distributed among the various possible X values.
  • Suppose, for example, that a business has just purchased four laser printers, and let X be the number among these that require service during the warranty period.​
    • Possible X values are then 0, 1, 2, 3, and 4. The probability distribution will tell us how the probability of 1 is subdivided among these five possible values— how much probability is associated with the X value 0, how much is apportioned to the X value 1, and so on.
  • We will use the following notation for the probabilities in the distribution:

p (0) = the probability of the X value 0 = P(X = 0)

p (1) = the probability of the X value 1 = P(X = 1)
* In general, p (x) will denote the probability assigned to the value x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Example: Four Coin Tosses

Toss a balanced coin four times; the discrete random variable X counts the number of heads. How do we find the probability distribution function of X?​

A
  • The outcome of four tosses is a sequence of heads and tails such as HTTH and there are 16 possible outcomes.

P(X=0) = 1/16 = .0625

P(X=1) = 4/16 = 0.25

P(X=2) = 6/16 = .375

P(X=3) = 4/16 = .25

P(X=4) = 1/16 = .0625

  • The sum of these probabilities =1, so this is a legitimate probability distribution function.
  • In the table form, the probability distribution for the rv X is:

Number of heads X: 0 1 2 3 4

Probability (X=x):.0625 0.25 0.375 0.25 0.0625

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Probability Mass Function (pmf)

A
  • a discrete distribution is described by giving its probability mass function, or pmf, either as a table or as a function.
  • Note: In first bullet, change “all x () W” to “all s () S”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Example 7

  • The Cal Poly Department of Statistics has a lab with six computers reserved for statistics majors.
  • Let X denote the number of these computers that are in use at a particular time of day.
  • Suppose that the probability distribution of X is as given in the following table; the first row of the table lists the possible X values and the second row gives the probability of each such value.​
A

•We can now use elementary probability properties to calculate other probabilities of interest. For example, the probability that at most 2 computers are in use is

P(X <= 2) = P(X = 0 or 1 or 2)

= p(0) + p(1) + p(2)

= .05 + .10 + .15

= .30

•Since the event at least 3 computers are in use is complementary to at most 2 computers are in use,

P(X >= 3) = 1 – P(X <= 2)

= 1 – .30

= .70

which can, of course, also be obtained by adding together probabilities for the values, 3, 4, 5, and 6.

•The probability that between 2 and 5 computers inclusive are in use is

P(2 <= X <= 5) = P(X = 2, 3, 4, or 5)

= .15 + .25 + .20 + .15

= .75

whereas the probability that the number of computers in use is strictly between 2 and 5 is

P(2 < X < 5) = P(X = 3 or 4)

= .25 + .20

= .45

​ ​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Special Discrete Random Variables: Bernoulli random variables

A
  • Consider the experiment of recording the school year of the 1st student that walks into this class.
  • Define a rv Y by
  • Y = 1 if the 1st student to walk in is a 1st-year student
  • Y = 0 if the 1st student is not a 1st-year student (ie all other types of students)

Such random variables arise frequently enough that they have been given a name-Bernoulli ( after the person who first studied this rv)

•Any random variable whose only possible values are 0 and 1 is called a Bernoulli random variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

1st-year Student Example: Bernoulli random variable

A
  • Let’s say we conduct this experiment for a month, where: Y=1 if the 1st student to walk in is a 1st-year student; Y=0 if the 1st student is not a 1st-year student
  • Let’s say we find that 60% of the time, the 1st student walking in was a 1st-year student. Then we have:
    • p(1)=P(Y=1)=0.60
    • p(0)=P(Y=0)=0.40
    • p(Y=y)=0 for all y≠ 0 or y≠ 1
  • A typical and equivalent representation for the pmf of a Bernoulli rv is:
    • p(y) =0.4 if y=0
    • =0.6 if y=1
    • =0 otherwise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Example: A Parameter of a Probability Distribution

A
  • The Bernoulli rv X = 1 when a purchaser at a store selected a desktop computer and X= 0 otherwise.
  • At one store, p(0) = .8 and p(1) = .2 because 20% of all purchasers selected a desktop computer. At another store, it may be the case that p(0) = .9 and p(1) = .1.
  • More generally, the pmf of any Bernoulli rv can be expressed in the form p(1) = a and p(0) = 1 – a, where 0 < a < 1.
  • The pmf depends on the particular value of a (the parameter of the Bernoulli distribution), hence we often use the notation p(x;a) rather than just p(x) for the Bernoulli pmf:

19
Q

A Parameter of a Probability Distribution

A
  • Definition of a parameter of a probability distribution:
    • Suppose p(x) depends on a quantity that can be assigned any one of a number of possible values, with each different value determining a different probability distribution. Such a quantity is called a parameter of the distribution.
  • The collection of all probability distributions for different values of the parameter is called a family of probability distributions.
  • The quantity a(alpha) in Expression (3.1) is a parameter.
  • Each different number a(alpha) between 0 and 1 determines a different member of the Bernoulli family of distributions

20
Q

Example 12 - Geometric Distribution

A
  • Starting at a fixed time, we observe the gender of each newborn child at a certain hospital until a boy (B) is born.
  • Let p = P(B), assume that successive births are independent, and define the rv X by x = number of births observed until a boy is born.

Then,

p(1) = P(X = 1)

= P(B)

= p

p(2) = P(X = 2)

= P(GB)

= P(G) • P(B)

= (1 – p)p

p(3) = P(X = 3)

= P(GGB)

= P(G) • P(G) • P(B)

= (1 – p)2p

•Continuing in this way, a general formula emerges:

(3.2)

•The parameter p can assume any value between 0 and 1. Expression (3.2) describes the family of geometric distributions.

21
Q

Parameters of a distribution

A
22
Q

The Cumulative Distribution Function (cdf)

A
  • For some fixed value x, we often wish to compute the probability that the observed value of the rv X will be at most x.
  • The probability that X is at most 1 is then

P(X <= 1) = p(0) + p(1) = .500 + .167 = .667

•The largest possible X value is 2, so

P(X <= 2) = 1 ; P(X <= 3.7) = 1; P(X <= 20.5) = 1 and so on.

•Notice that P(X < 1) < P(X <= 1) since the latter includes the probability of the X value 1, whereas the former does not. More generally, when X is discrete and x is a possible value of the variable, P(X < x) < P(X <= x).

23
Q

Cumulative Distribution Function

A
24
Q

Cdf (continued)

A
25
Q

Plot of cdf

A
26
Q

Cdf Example

Four Coin Tosses

•Recall previous example of 4 coin tosses:

In the table form, the distribution is

Number of heads X: 0 1 2 3 4

P(X=x): 0.0625 0 .25 0 .375 0.25 0 .0625

•The probability of tossing at most two heads is:

F(2)=P(X≤2)=P(X=0 or 1 or 2)=p(0)+p(1)+p(2)

=0.0625+0.25+0.375=0.6875

•Try: Plot the graph of F(x) for different values of x. How would you best describe this plot?

A
  • So the cdf F(x) can be obtained from the probability distribution p(x)
  • Reverse is also true ie pmf can also be obtained from the cdf
  • In previous example: p(2)=P(X=2)

=[p(0)+p(1)+p(2)]-[p(0)+p(1)]

=P(X≤2)-P(X≤1)

=F(2)-F(1)

27
Q

Cdf Example 13 - Constructing a cdf

  • A store carries flash drives with either 1 GB, 2 GB, 4 GB, 8 GB, or 16 GB of memory.
  • The accompanying table gives the distribution of Y = the amount of memory in a purchased drive:

A

•Let’s first determine F (y) for each of the five possible values of Y:

F (1) = P (Y <= 1)

= P (Y = 1)

= p (1)

= .05

F (2) = P (Y <= 2)

= P (Y = 1 or 2)

= p (1) + p (2)

= .15

F(4) = P(Y <= 4)

= P(Y = 1 or 2 or 4)

= p(1) + p(2) + p(4)

= .50

F(8) = P(Y <= 8)

= p(1) + p(2) + p(4) + p(8)

= .90

F(16) = P(Y <= 16)

= 1

========================================================

•Now for any other number y, F (y) will equal the value of F at the closest possible value of Y to the left of y. For example,

F(2.7) = P(Y <= 2.7)

= P(Y <= 2)

= F(2)

= .15

F(7.999) = P(Y <= 7.999)

= P(Y <= 4)

= F(4)

= .50

•If y is less than 1, F (y) = 0 [e.g. F(.58) = 0], and if y is at least 16, F (y) = 1[e.g. F(25) = 1]. The cdf is thus

28
Q

Cdf Example 13 - Graph

A

•For X a discrete rv, the graph of F (x) will have a jump at every possible value of X and will be flat between possible values. Such a graph is called a step function

29
Q

The Mean of a Random Variable

A
  • The mean X-bar of a set of observation is their ordinary average.
  • The mean X-bar of a random variable X is also the average of the possible values of X, but in this case not all outcomes need to be equally likely.
30
Q

Expected Value of a Discrete Random Variable

A

Value of X x1 x2 x3 … xk

Probability P(X=xi) p1 p2 p3 … pk

•To find the expected value E(X) or mean value for the rv X, multiply each possible value by its probability, then add all the products:

31
Q

EV Example: Hard-Drive Example

The following table gives the distribution of customer choices of hard-drive size for a laptop computer model. Find the mean of this probability distribution.

Hard drive X 10 20 30 40

Probability (X=x).50 .25 .15 .10

A
32
Q

The Expected Value of a Function of a discrete rv X

A

Sometimes interest will focus on the expected value of some function h (X) rather than on just E (X).

Proposition
If the rv X has a set of possible values D and pmf p (x), then the expected value of any function h (X), denoted by E [h (X)] or mh(X), is computed by:

That is, E [h (X)] is computed in the same way that E (X) itself is, except that h (x) is substituted in place of x.

33
Q

Example 23: EV of function

  • A computer store has purchased three computers of a certain type at $500 apiece. It will sell them for $1000 a piece.
  • The manufacturer has agreed to repurchase any computers still unsold after a specified period at $200 apiece.

Let X denote the number of computers sold, and suppose that

p(0) = .1,

p(1) = .2,

p(2) = .3

p(3) = .4

A

•With h (X) denoting the profit associated with selling X units, the given information implies that

h (X) = revenue – cost

= 1000X + 200(3 – X) – 1500

= 800X – 900

The expected profit is then

E [h(X)] = h(0) • p(0) + h(1) • p(1) + h(2) • p(2) + h(3) • p(3)

= (–900)(.1) + (– 100)(.2) + (700)(.3) + (1500)(.4)

= $700

34
Q

Rules of Expected Value

A

The h (X) function of interest is quite frequently a linear function aX + b. In this case, E[h(X)] is easily computed from E(X).

Proposition
E(aX + b) = a • E(X) + b
(Or, using alternative notation, maX+b = a • mx + b)

To paraphrase, the expected value of a linear function equals the linear function evaluated at the expected value
E(X). Since h(X) in Example 23 is linear and
E(X) = 2, E[h(x)] = 800(2) – 900 = $700, as before.

35
Q

The Variance of X

A

Definition
Let X have pmf p(x) and expected value mew. Then the variance of X, denoted by V(X) or σ 2x or just σ2, is

36
Q

Standard Deviation (SD) of X

A
37
Q

Variance of X

A
  • The quantity h(X) = (X – mew)2 is the squared deviation of X from its mean, and σ2 is the expected squared deviation—i.e., the weighted average of squared deviations, where the weights are probabilities from the distribution.
  • If most of the probability distribution is close to mew, then σ2 will be relatively small.
  • However, if there are x values far from mew that have large p(x), then σ2 will be quite large.
  • Very roughly σ can be interpreted as the size of a representative deviation from the mean value mew.
  • So if σ = 10, then in a long sequence of observed X values, some will deviate from m by more than 10 while others will be closer to the mean than that—a typical deviation from the mean will be something on the order of 10

38
Q

Example 24

A library has an upper limit of 6 on the number of videos that can be checked out to an individual at one time. Consider only those who check out videos, and let X denote the number of videos checked out to a randomly selected individual. The pmf of X is as follows:

The expected value of X is easily seen to be m = 2.85.

A

The variance of X is then

= (1 – 2.85)2(.30) + (2 – 2.85)2(.25) + … +
(6 – 2.85)2(.15) = 3.2275

The standard deviation of X is σ= sqrt(3.2275) = 1.800.

39
Q

A Shortcut Formula for σ2

A

The number of arithmetic operations necessary to compute σ2 can be reduced by using an alternative formula.

Proposition
V(X) = σ2 = [Σx2 • p(x)] – mew2 = E(X2) – [E(X)]2

In using this formula, E(X2) is computed first without any subtraction; then E(X) is computed, squared, and subtracted (once) from E(X2).

40
Q

Rules of Variance

A
41
Q

Rules of Variance (cntd.)

A
42
Q

Rules of Variance (ctnd. 2)

A

According to the first relation in (3.14), the sd in the new unit is the original sd multiplied by the conversion factor.

The second relation says that adding or subtracting a constant does not impact variability; it just rigidly shifts the distribution to the right or left.

43
Q

Example 26 - Rules of Variance

A

In the computer sales scenario of Example 23, E(X) = 2
and

E(X2) = (0)2(.1) + (1)2(.2) + (2)2(.3) + (3)2(.4) = 5

so, V(X) = 5 – (2)2 = 1.

The profit function h(X)= 800X – 900 then has

variance, V(h(X)) =(800)2 • V(X) = (640,000)(1) = 640,000.

Standard deviation of h(X) is the sqrt[V(h(X))]= 800.

44
Q
A