Probability Flashcards

1
Q

provide a high level definition of conditional probability.

A

Suppose event E is in a total sample space S and P(E)>0.

The probability that event A occurs GIVEN E has occurred, specifically the CONDITIONAL PROBABILITY of A GIVEN E, P(A|E) is

P(A|E) = P(A^E) / P(E) = n(A^E) / n(E)

Image a Venn diagram with even E, event A, and an INTERSECTION E^A in space S.

P(A|E)
= number elements in A^E / number elements in E

Thus, the event space is REDUCED only to the area which E has occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A PAIR of dice is tossed. What is the probability that one of the dice is 2 if the sum is 6?

A

Define the events:

E = {event that two dice roll sum is 6}
E = {(1,5),(2,4),(3,3),(4,2),(5,1)}
nE = 5
A = {2 appears on a least one die}
A = {(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(6,2),(5,2),(4,2),(3,2),(1,2)}

P(A|E) = P(A^E) / P(E) = n(A^E) / n(E)

(A^E) = {(2,4),(4,2)}
n(A^E) =2

Thus, P(A^E)/P(E) = 2/5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A couple has two children. Find the probability p that BOTH children are boys if at least one child is a boy.

A

S = {BB, BG, GB, GG}

A = {both children are boys}
A = {BB} 
nA = 1
B = {at least one child is a boy}
B = {BB, BG, GB}
nB = 3

P(both children boys | at least one B}
P(A|B) = P(A^B) / P(B)
= n(A^B) / n(B)

(A^B) = {BB}
n(A^B) = 1

P(A|B) = 1/3

The sample space was reduced from 4 in S to 3 in B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A couple has two children. Find the probability p that BOTH children are boys if the OLDER child is a boy.

A

S = {BB, BG, GB, GG}

A = {both are boys}
A = {BB}
nA = 1
B = {OLDER child is BOY}
B = {BB, BG}
nB = 2

The sample space was reduced from 4 in S to 2 in B, so

P(A|B) = P(A^B) / P(B)
= n(A^B) / n(B)

(A^B) = {BB}
n(A^B) = 1

n(A^B) / n(B) = 1/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Multiplication Theorem for Conditional Probability and what does it tell us?

A

Suppose events A and B are in a sample space S and P(A)>0.

By the defn of conditional probability and that A^B = B^A:

Multiplying both sides gives:

P(A|B) = P(A^B) / P(B)
P(A|B) = P(B^A) / P(B)
P(B^A) = P(B) P(A|B) 
P(B|A) = P(B^A) / P(A)
P(B|A) = P(A^B) / P(A)
P(A^B) = P(A) P(B|A) 

The MULTIPLICATION THEOREM gives us formula for the PROBABILITY that events A and B BOTH OCCUR.

The multiplication theorem can be generalized, e.g.:

P(A^B^C) = P(A)P(B|A)P(C|A^B)

The probability that A,B, and C occurs is equal to the product that:

(i) probability that A occurs
(ii) probability that B occurs, given A occurred
(iii) probability that C occurs, given A^B occurred

e.g.

a lot contains 12 items and 4 are defective.

Three items drawn at random. What is the probability that all three are NONdefective?

(i) probability that first is nondefective is 8/12.
(ii) probability that second is nondefective is 7/11
(iii) probability that third is defective is 6/10.

The above are P(A)P(B|A)P(C|B^A).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does the multiplication theorem relate to tree diagrams?

A

A stochastic process is a finite sequence of experiments where each experiment has finite number of outcomes with given probabilities. A convenient way of describing such a process is by means of a LABELED TREE DIAGRAM.

The multiplication theorem can then be used to compute the probability of an event which is represented by a GIVEN PATH of the TREE.

e.g.
Box X has 10 light bulbs, 4 are defective
Box Y has 6 light bulbs, 1 is defective
Box Z has 8 light bulbs, 3 are defective

(a) find the proba that the bulb is NONdefective.

                       defective 2/5 
                   /
         X  1/3      
     /             \  nondefective 3/5 
   /
                   /  defective 1/6 root --   Y  1/3
   \               \ nondefective 5/6
     \            
                   / defective  3/8
         Z  1/3   
                   \ nondefective  5/8

P(nondefective)
= (1/3)(3/5)+(1/3)(5/6)+(1/3)(5/8)
247/360
= .686

(b) If the bulb is nondefective, find the prob that it originated from box Z.

We want P(Z|N):

P(Z|N) = P(Z^N)/P(N)

P(Z^N) = (1/3)(5/8) = 5/24
from part (a), P(N) = 247/36

Then P(Z^N) / P(N)
= (5/24) / (247/360)
= 75/247
= .304

Notice the sample space was reduced from 360 to 247.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Given an unfair coin with P(H) = 2/3,

if H appears, then a number is randomly selected in [1,…9]

if T appears, then a number is randomly selected in [1,…,5].

What is the probability that an even number appears?

A

we want P(select even number | unfair coin)

H = {select even number from 1,2,3,4,5,6,7,8,9}
H = {2,4,6,8}
nH = 4
P(H) = 4/9
T = {select even number from 1,2,3,4,5}
T = {2,4}
nT = 2
P(T) = 2/5              
                          / --5/9  Odd  
         / 2/3  H -- 
       /                  \ -- 4/9 Even    
root
      \                  / -- 3/5 Odd
        \  1/3   T --
                         \  -- 2/5 Even

P(Even) = (2/3)(4/9) + (1/3)(2/5) = 58/135 = .43

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Law of Total Probability?

A

Given a sample space S, suppose E is any subset of S.

Let A1,..,An be DISJOINT partitions in S (where A1 U A2 U … U An = S) which create DISJOINT subsets of E.

Since E^Ai are DISJOINT we obtain:
P(E) = P(E^A1)+P(E^A2)+…+P(E^An)

Recall the MULTIPLICATION THEOREM of CONDITIONAL PROBABILITY:

P(E^Ai) = P(Ai^E) = P(Ai) P(E|Ai)

Then the LAW of TOTAL PROBABILITY states:

P(E) = P(A1) P(E|A1) + P(A2) P(E|A2) +…+ P(An) P(E|An)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A factory uses three machines X,Y,Z to produce items.

Suppose:

X produces 50% of all items with 3% defects

Y produces 30% of all items with 4% defects

Z produces 20% of all items with 5% defects

What is the probability p that a randomly selected item is defective?

A

D = {item is defective}

Using LAW of TOTAL PROBABILITY

P(D) = P(X)P(D|X) + P(Y)P(D|Y) + P(Z)P(D|Z)

given:
X produces 50% of all items with 3% defects
Y produces 30% of all items with 4% defects
Z produces 20% of all items with 5% defects

P(D) = (.5)(.03) + (.3)(.04) + (.2)(.05) = .037

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the use of Bayes Formula.

A

Given sample space S, let evens A1,…,An form PARTITIONS of S and let E be some event.

Then for k = 1,2,…,n the MULTIPLICATION THEOREM for CONDITIONAL PROBABILITY tells us:

P(Ak^E) = P(Ak)*P(E|Ak)

From conditional probability:

P(Ak|E) = P(Ak^E) /P(E)

Substituting multiplication theorem for P(Ak^E) in numerator:

P(Ak|E) = P(Ak)*P(E|Ak) /P(E)

Using the LAW of TOTAL PROBABILITY for the denominator P(E) we arrive at BAYES’ THEOREM:

P(Ak|E)
= P(Ak)P(E|Ak) / P(A1)P(E|A1)+…+P(Ak)*P(E|An)

Intuitively, we can think of DISJOINT events A1,…,An as possible CAUSES of event E. Then Bayes’ formula enables us to determine the probability that a particular one of the A’s occurred, GIVEN that E occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A factory uses three machines X,Y,Z to produce items.

Suppose:

X produces 50% of all items with 3% defects

Y produces 30% of all items with 4% defects

Z produces 20% of all items with 5% defects

Suppose a defective item is found among the output.

Find the probability that it came from each of the machines, i.e. find P(X|D), P(Y|D), and P(Z|D).

A

Recall the law of total probability:

P(D) = P(X)P(D|X) + P(Y)P(D|Y) + P(Z)*P(D|Z)
= (.5)(.03)+(.3)(.04)+(.2)(.05) = .037

P(X|D)
= P(X^D) /P(D) by conditional probability
= P(X)*P(D|X) /P(D) by multiplication theorem
= (.5)(.03) /.037
= 15/37
= .405

P(Y|D)
= P(Y^D) /P(D)
= P(Y)*P(D|Y) /P(D)
= (.3)(.04) /.037
= 12/37 
= .325
P(Z|D)
= P(Z^D) /P(D)
= P(Z)*P(D|Z) /P(D)
= (.2)(.05) /.037
= 10/37
= .27
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you express Bayes’ Theorem as a formula?

A

Bayes’ Theorem is expressed mathematically:

P(A|B) = P(B|A)*P(A) / P(B)

where:
P(A|B) is a cond. probability: the LIKELIHOOD of event A occurring given that B is true.

P(B|A) is a cond. probability: the LIKELIHOOD of event B occurring given that A is true.

P(A) and P(B) are the probabilities of observing A and B INDEPENDENTLY of ea. other (know as marginal proba).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Provide an Interpretation of Bayesian vs. Frequentist views of probability.

A

In the Bayesian interpretation, probability measures a “DEGREE of BELIEF”.

Bayes’ Theorem then links the degree of belief in a proposition before and after accounting for EVIDENCE.

e.g., suppose it is believed with 50% certainty that a coin is twice as likely to land heads. If the coin is flipped a number of times and the outcomes observed, that degree of belief may RISE, FALL, or REMAIN the SAME. depending on the results.

Bayes’ Theorem:

P(A|B)
= P(B|A)*P(A) /P(B)

where:
P(A), the PRIOR, is the INITIAL degree of BELIEF in A.

P(A|B), the POSTERIOR, is the degree of BELIEF having ACCOUNTING for B.

the quotient, P(B|A)/P(B), is the SUPPORT B provides for A.

In the FREQUENTIST interpretation, probability measures a PROPORTION of outcomes.

e.g. when an experiment is performed many times,
P(A), P(B) are simply the prop of outcomes with property A, B.

P(A|B) is the prop of outcomes with A out of outcomes with B and P(B|A) is the prop of outcomes with B out of those with A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Provide the form of Bayes’ Theorem when looking at two COMPETING statements of of hypotheses.

A

P(A|B)
= P(B|A)P(A) /P(B)
= P(B|A)
P(A) / P(B|A)P(A) + P(B|A’)P(A’)

where:
P(A) is the PRIOR proba, the INITIAL degree of belief in A.

P(A’) is the corresponding proba of the INITIAL belief AGAINST A, i.e. 1-A = A’

P(B|A) is the LIKELIHOOD or cond. proba, or degree of belief in B GIVEN (A is TRUE).

P(B|A’) is the LIKELIHOOD or cond. proba, or degree of belief in B GIVEN (A is FALSE).

P(A|B) is the POSTERIOR proba, the proba for A AFTER taking into ACCOUNT B FOR and AGAINST A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Show on a tree diagram how Bayes’ formula can be interpreted as a TWO STEP stochastic process.

A

The first stochastic process in the tree involves the events Ak which PARTITION space S (Bayes’ numerator).

The second step involves the arbitrary event E (Bayes’ denominator).

e.g. suppose there are three DISJOINT events partitioning S, A1, A2, A3:

     /  ---- P(A1) --  A1 --------P(E|A1)----------E
   / root ------- P(A2) -- A2 -------P(E|A2) ---------E
   \
     \ ---- P(A3) -- A3 --------P(E|A3) ---------E

If we want P(E), which is TOTAL PROBABILITY, using the tree we obtain:

P(E) = P(E|A1)P(A1) + P(E|A2)P(A2) + P(E|A3)*P(A3)

Furthermore, if we P(Ak|E) for k=1,2,3:

P(Ak|E) = P(Ak^E) /P(E)
= P(E^Ak)P(Ak) /P(E)
= P(E^Ak)
P(Ak) / P(E|A1)P(A1) + P(E|A2)P(A2) * P(E|A3)*P(A3)

Notice that the above TWO formulas are simply the LAW of TOTAL PROBABILITY in the denominator and BAYES’ FORMULA in the numerator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Suppose a dormitory in a college consists of:

30% freshman, of whom 10% own a car
40% sophs, of whom 20% own a car
20% juniors, of whom 40% own a car
10% seniors, of whom 60% own a car

Let A,B,C,D denote, resp, frosh,sophs,jrs, senrs, and let E denote the set of students owning a car.

(a) . What is the probability of owning a car?
(b) What is the proba of junior who own a car?

A

30% freshman, of whom 10% own a car
40% sophs, of whom 20% own a car
20% juniors, of whom 40% own a car
10% seniors, of whom 60% own a car

        / ----.30-------A--------.10--------E
      /
    /     ----.40-----B---------.2----------E 
  /     / root --/
  \   \
    \   \ ----.20-----C---------.4----------E 
      \   
        \ ----- .10------D---------.6---------E

(a) By law of total probability:
P(E) = P(E|A)P(A)+P(E|B)P(B)+P(E|C)P(C)

= (.1)(.3)+(.4)(.2)+(.2)(.4)+(.1)(.6)
= .03+.08+.08+.06
= .25

(b) P(junior|own car)
= P(C|E)
= P(E|C)*P(C) /P(E)  by Bayes' theorem (flip terms)
= (.4)(.2) / P(E)
= .08 / .25
= 8/25
= .32
17
Q

Explain independence of events.

A

Events A and B in a probability space S are said to be INDEPENDENT if the occurrence of one of them does NOT influence the other.

More specifically, B is independent of A if P(B) is the SAME as P(B|A).

Now suppose we substitute P(B) for P(B|A) in the MULTIPLICATION THEOREM that

P(A^B) = P(B|A)*P(A)

Then this yields:

P(A^B) = P(B|A)P(A) = P(B)P(A)

18
Q

What is a probability density function?

A

Let X be a CONTINUOUS rv. Then a PROBABILITY DISTRIBUTION or probability DENSITY FUNCTION (pdf) of X is a function f(x) s.t. any two numbers a and b with a <= b:

P(a <= X <= b) = integral (a,b) f(x) dx

i.e. the probability that X takes on a value in the interval [a,b] is the AREA above this interval and UNDER the graph of the DENSITY function (Probability DISTRIBUTION)

For f(x) to be a legit pdf, it must satisfy two conditions:

  1. f(x) >= 0 for all x
  2. integral (-inf,inf) f(x) dx = area under entire graph of f(x) = 1
19
Q

What is a cumulative distribution function (cdf)?

A

The CUMULATIVE DISTRIBUTION FUNCTION F(x) for a continuous rv X is defined for every number x by:

F(x) = P(X <= x) = integral(-inf,x) f(y) dy

For ea x, F(x) is the AREA UNDER the PROBABILITY density curve (pdf) to the LEFT of x.

20
Q

How do you use a CDF to compute probabilities?

A

The importance of a cdf, just as for discrete rvs, is that probabilities of various intervals can be computed from a formula for or table of F(x).

Let X be a continuous rv with probability function (proba distribution) pdf f(x) and cdf F(x). Then for any number a,

P(X<=a) = F(a) from cdf defn

P(X>a) = 1 - F(a)

and for any two numbers a and b with a<b></b>

21
Q

How do you obtain a pdf f(x) from a cdf F(x)?

A

For X discrete, he pmf is obtained from th cdf by taking the difference between two F(x) values. The continuous analog of a difference is a derivative. The following result is a consequence of the Fundamental Theorem of Calc:

If X is a continuous rv with pdf f(x) and cdf F(x), then at every x at which the derivative F’(x) exists, F’(x) = f(x)

22
Q

What is the pdf of a normal distribution?

A

Even when the underlying dist is discrete or not even normally dist’d, the normal curve often gives an excellent approximation.

A continuous rv X is said to have a NORMAL DISTRIBUTION with parameters mu and sigma, where (-inf

23
Q

What is the standard normal distribution and why do we use it?

A

To compute the cdf P(a<=X<=b) when X is a NORMAL rv with parameters mu and sigma, we must determine

integral(-inf,inf) f(x; mu,sigma) dx

However, none of the standard integration techniques can be used to eval the above expression.

Instead, for mu=0, sigma=1, we CAN take the integral for cdf and TABLES have been created with these STANDARD NORMAL CDF PROBABILITIES.

The normal dist with parameter values mu=0, sigma=1 is called the STANDARD NORMAL DISTRIBUTION.

A rv having a standard normal dist is called a STANDARD NORMAL RANDOM VARIABLE, denoted as Z.

The pdf of Z is:
f(z; mu=0, sigma=1) = 1/sqrt(2pi) * exp(-z^2 /2)

The graph of f(z;0,1) is called the STANDARD NORMAL (or z) CURVE.

The cdf of Z is:
phi(z) = P(Z<=z) = integral(-inf,z) f(y;0,1) dy

24
Q

What is phi(z)?

A

For a STANDARD NORMAL rv Z, phi(z) is the cdf:

F(Z) = P(Z<=z)
= integral(-inf,z) f(y;0,1) dy
= phi(z)

Standard normal tables provide phi(z) = P(Z<=z), the AREA UNDER the standard normal density curve to the LEFT of z.

e.g. P(Z<=1.25) = phi(1.25)

looking at the std normal table, z=1.2 on y-axis and intersection with .05 on the x-axis, we have the cdf of a std normal, phi(1.25) = .8944

25
Q

Explain how percentiles work for the standard normal curve table.

A

For any p in [0,1], the standard normal curve table can be used to obtain percentiles of the standard normal distribution.

e.g. the 99th percentile of the std normal dist is that VALUE z of the horizontal axis such that the AREA UNDER the z curve to the LEFT of the values is .9900.

The std norm table gives for fixed value of z the area under the curve to the left of z, whereas here we have the AREA (.99) and want the value of z.

This is the INVERSE problem to P(Z<=z) = ?, so the table is used in inverse fashion: find .99 in the table and locate its x,y coordinates the z value.

Here, .9901 lis at the intersection of row marked 2.3 and col mared .03, so the 99th pctile is approx z=2.33, and by symmetry, the first pctile is approx z=-2.33.

Thus, 1% and 99% of the data will lie -2.33 and 2.33 standard devs from mean 0.

26
Q

Explain Z_alpha (critical values).

A

In stat inference, we will need the values on the horizontal z-axis that capture certain small tail areas under the standard normal curve.

z_alpha will denote the value on the z axis for which alpha of the AREA UNDER the z curve lies to the right of z_alpha.

e.g.
z_.10 captures UPPER TAIL AREA .10
z_.01 captures UPPER TAIL AREA .01

Since alpha of the AREA UNDER the z curve lies to the RIGHT of z_alpha, then 1-alpha of the AREA lies to its LEFT.

Thus, z_alpha is the (1-alpha)th percentile of the std normal dist.

By symmetry, the area under the std normal curve to the left of -z_alpha is also alpha.

z_alpha are usually referred to as z CRITICAL VALUES.

the most important z critical values are:
90%ile, alpha=.1, z_alpha=1.28
95%ile, alpha=.05, z_alpha=1.645
97.5%ile, alpha=.025, z_alpha=1.96
99%ile, alpha=.01, z_alpha=2.33
99.5%ile, alpha=.005, alpha=2.58
27
Q

How do you standardize a normal rv?

A

When Z~N(mu,sigma^2), probabilities involving are computed by “standardizing”.

The standardized variable is:

(X-mu) /sigma

  1. Subtracting by mu shifts the mean from mu to zero.
  2. dividing by sigma scales the variables so that the std is 1 rather than sigma.

The key idea of the proposition is that by standardizing, any probability involving X can be expressed as a probability involving a STANDARD NORMAL rv Z, so that the std norm cdf table can be utilized.

28
Q

What is the difference between P(2 Heads in row) and p-value of 2 Heads in a row?

A

P(2 H in row) =
/ — .5 —- HH
/ – .5 – H
/ \ — .5 —- HT
root
\ / — .5 —– TH
\ – .5 – T
\ — .5 —– TT

so P(HH) 
= {HH} / {HH,HT,TH,TT} = 1/4

P-value of two heads is the probability that RANDOM CHANCE generated the [OUTCOME] or something [ELSE AS EQUAL] or [MORE RARE]:

p value
= the outcome + an outcome as equal + outcome more rare
= {HH} / {HH,HT,TH,TT} + {TT} / {HH,HT,TH,TT} + NULL
= 1/4 + 1/4 + null
= 1/2

thus,
P(HH) != pvalue of HH