Probability Flashcards by Data Science

provide a high level definition of conditional probability.

Suppose event E is in a total sample space S and P(E)>0.

The probability that event A occurs GIVEN E has occurred, specifically the CONDITIONAL PROBABILITY of A GIVEN E, P(A|E) is

P(A|E) = P(A^E) / P(E) = n(A^E) / n(E)

Image a Venn diagram with even E, event A, and an INTERSECTION E^A in space S.

P(A|E)
= number elements in A^E / number elements in E

Thus, the event space is REDUCED only to the area which E has occurred.

How well did you know this?

Not at all

Perfectly

A PAIR of dice is tossed. What is the probability that one of the dice is 2 if the sum is 6?

Define the events:

E = {event that two dice roll sum is 6}
E = {(1,5),(2,4),(3,3),(4,2),(5,1)}
nE = 5

A = {2 appears on a least one die}
A = {(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(6,2),(5,2),(4,2),(3,2),(1,2)}

P(A|E) = P(A^E) / P(E) = n(A^E) / n(E)

(A^E) = {(2,4),(4,2)}
n(A^E) =2

Thus, P(A^E)/P(E) = 2/5

How well did you know this?

Not at all

Perfectly

A couple has two children. Find the probability p that BOTH children are boys if at least one child is a boy.

S = {BB, BG, GB, GG}

A = {both children are boys}
A = {BB} 
nA = 1

B = {at least one child is a boy}
B = {BB, BG, GB}
nB = 3

P(both children boys | at least one B}
P(A|B) = P(A^B) / P(B)
= n(A^B) / n(B)

(A^B) = {BB}
n(A^B) = 1

P(A|B) = 1/3

The sample space was reduced from 4 in S to 3 in B.

How well did you know this?

Not at all

Perfectly

A couple has two children. Find the probability p that BOTH children are boys if the OLDER child is a boy.

S = {BB, BG, GB, GG}

A = {both are boys}
A = {BB}
nA = 1

B = {OLDER child is BOY}
B = {BB, BG}
nB = 2

The sample space was reduced from 4 in S to 2 in B, so

P(A|B) = P(A^B) / P(B)
= n(A^B) / n(B)

(A^B) = {BB}
n(A^B) = 1

n(A^B) / n(B) = 1/2

How well did you know this?

Not at all

Perfectly

What is the Multiplication Theorem for Conditional Probability and what does it tell us?

Suppose events A and B are in a sample space S and P(A)>0.

By the defn of conditional probability and that A^B = B^A:

Multiplying both sides gives:

P(A|B) = P(A^B) / P(B)
P(A|B) = P(B^A) / P(B)
P(B^A) = P(B) P(A|B)

P(B|A) = P(B^A) / P(A)
P(B|A) = P(A^B) / P(A)
P(A^B) = P(A) P(B|A)

The MULTIPLICATION THEOREM gives us formula for the PROBABILITY that events A and B BOTH OCCUR.

The multiplication theorem can be generalized, e.g.:

P(A^B^C) = P(A)P(B|A)P(C|A^B)

The probability that A,B, and C occurs is equal to the product that:

(i) probability that A occurs
(ii) probability that B occurs, given A occurred
(iii) probability that C occurs, given A^B occurred

e.g.

a lot contains 12 items and 4 are defective.

Three items drawn at random. What is the probability that all three are NONdefective?

(i) probability that first is nondefective is 8/12.
(ii) probability that second is nondefective is 7/11
(iii) probability that third is defective is 6/10.

The above are P(A)P(B|A)P(C|B^A).

How well did you know this?

Not at all

Perfectly

How does the multiplication theorem relate to tree diagrams?

A stochastic process is a finite sequence of experiments where each experiment has finite number of outcomes with given probabilities. A convenient way of describing such a process is by means of a LABELED TREE DIAGRAM.

The multiplication theorem can then be used to compute the probability of an event which is represented by a GIVEN PATH of the TREE.

e.g.
Box X has 10 light bulbs, 4 are defective
Box Y has 6 light bulbs, 1 is defective
Box Z has 8 light bulbs, 3 are defective

(a) find the proba that the bulb is NONdefective.

                       defective 2/5 
                   /
         X  1/3      
     /             \  nondefective 3/5 
   /
                   /  defective 1/6 root --   Y  1/3
   \               \ nondefective 5/6
     \            
                   / defective  3/8
         Z  1/3   
                   \ nondefective  5/8

P(nondefective)
= (1/3)(3/5)+(1/3)(5/6)+(1/3)(5/8)
247/360
= .686

(b) If the bulb is nondefective, find the prob that it originated from box Z.

We want P(Z|N):

P(Z|N) = P(Z^N)/P(N)

P(Z^N) = (1/3)(5/8) = 5/24
from part (a), P(N) = 247/36

Then P(Z^N) / P(N)
= (5/24) / (247/360)
= 75/247
= .304

Notice the sample space was reduced from 360 to 247.

How well did you know this?

Not at all

Perfectly

Given an unfair coin with P(H) = 2/3,

if H appears, then a number is randomly selected in [1,…9]

if T appears, then a number is randomly selected in [1,…,5].

What is the probability that an even number appears?

we want P(select even number | unfair coin)

H = {select even number from 1,2,3,4,5,6,7,8,9}
H = {2,4,6,8}
nH = 4
P(H) = 4/9

T = {select even number from 1,2,3,4,5}
T = {2,4}
nT = 2
P(T) = 2/5

                          / --5/9  Odd  
         / 2/3  H -- 
       /                  \ -- 4/9 Even    
root
      \                  / -- 3/5 Odd
        \  1/3   T --
                         \  -- 2/5 Even

P(Even) = (2/3)(4/9) + (1/3)(2/5) = 58/135 = .43

How well did you know this?

Not at all

Perfectly

What is the Law of Total Probability?

Given a sample space S, suppose E is any subset of S.

Let A1,..,An be DISJOINT partitions in S (where A1 U A2 U … U An = S) which create DISJOINT subsets of E.

Since E^Ai are DISJOINT we obtain:
P(E) = P(E^A1)+P(E^A2)+…+P(E^An)

Recall the MULTIPLICATION THEOREM of CONDITIONAL PROBABILITY:

P(E^Ai) = P(Ai^E) = P(Ai) P(E|Ai)

Then the LAW of TOTAL PROBABILITY states:

P(E) = P(A1) P(E|A1) + P(A2) P(E|A2) +…+ P(An) P(E|An)

How well did you know this?

Not at all

Perfectly

A factory uses three machines X,Y,Z to produce items.

Suppose:

X produces 50% of all items with 3% defects

Y produces 30% of all items with 4% defects

Z produces 20% of all items with 5% defects

What is the probability p that a randomly selected item is defective?

D = {item is defective}

Using LAW of TOTAL PROBABILITY

P(D) = P(X)P(D|X) + P(Y)P(D|Y) + P(Z)P(D|Z)

given:
X produces 50% of all items with 3% defects
Y produces 30% of all items with 4% defects
Z produces 20% of all items with 5% defects

P(D) = (.5)(.03) + (.3)(.04) + (.2)(.05) = .037

How well did you know this?

Not at all

Perfectly

Explain the use of Bayes Formula.

Given sample space S, let evens A1,…,An form PARTITIONS of S and let E be some event.

Then for k = 1,2,…,n the MULTIPLICATION THEOREM for CONDITIONAL PROBABILITY tells us:

P(Ak^E) = P(Ak)*P(E|Ak)

From conditional probability:

P(Ak|E) = P(Ak^E) /P(E)

Substituting multiplication theorem for P(Ak^E) in numerator:

P(Ak|E) = P(Ak)*P(E|Ak) /P(E)

Using the LAW of TOTAL PROBABILITY for the denominator P(E) we arrive at BAYES’ THEOREM:

P(Ak|E)
= P(Ak)P(E|Ak) / P(A1)P(E|A1)+…+P(Ak)*P(E|An)

Intuitively, we can think of DISJOINT events A1,…,An as possible CAUSES of event E. Then Bayes’ formula enables us to determine the probability that a particular one of the A’s occurred, GIVEN that E occurred.

How well did you know this?

Not at all

Perfectly

A factory uses three machines X,Y,Z to produce items.

Suppose:

X produces 50% of all items with 3% defects

Y produces 30% of all items with 4% defects

Z produces 20% of all items with 5% defects

Suppose a defective item is found among the output.

Find the probability that it came from each of the machines, i.e. find P(X|D), P(Y|D), and P(Z|D).

Recall the law of total probability:

P(D) = P(X)P(D|X) + P(Y)P(D|Y) + P(Z)*P(D|Z)
= (.5)(.03)+(.3)(.04)+(.2)(.05) = .037

P(X|D)
= P(X^D) /P(D) by conditional probability
= P(X)*P(D|X) /P(D) by multiplication theorem
= (.5)(.03) /.037
= 15/37
= .405

P(Y|D)
= P(Y^D) /P(D)
= P(Y)*P(D|Y) /P(D)
= (.3)(.04) /.037
= 12/37 
= .325

P(Z|D)
= P(Z^D) /P(D)
= P(Z)*P(D|Z) /P(D)
= (.2)(.05) /.037
= 10/37
= .27

How well did you know this?

Not at all

Perfectly

How do you express Bayes’ Theorem as a formula?

Bayes’ Theorem is expressed mathematically:

P(A|B) = P(B|A)*P(A) / P(B)

where:
P(A|B) is a cond. probability: the LIKELIHOOD of event A occurring given that B is true.

P(B|A) is a cond. probability: the LIKELIHOOD of event B occurring given that A is true.

P(A) and P(B) are the probabilities of observing A and B INDEPENDENTLY of ea. other (know as marginal proba).

How well did you know this?

Not at all

Perfectly

Provide an Interpretation of Bayesian vs. Frequentist views of probability.

In the Bayesian interpretation, probability measures a “DEGREE of BELIEF”.

Bayes’ Theorem then links the degree of belief in a proposition before and after accounting for EVIDENCE.

e.g., suppose it is believed with 50% certainty that a coin is twice as likely to land heads. If the coin is flipped a number of times and the outcomes observed, that degree of belief may RISE, FALL, or REMAIN the SAME. depending on the results.

Bayes’ Theorem:

P(A|B)
= P(B|A)*P(A) /P(B)

where:
P(A), the PRIOR, is the INITIAL degree of BELIEF in A.

P(A|B), the POSTERIOR, is the degree of BELIEF having ACCOUNTING for B.

the quotient, P(B|A)/P(B), is the SUPPORT B provides for A.

In the FREQUENTIST interpretation, probability measures a PROPORTION of outcomes.

e.g. when an experiment is performed many times,
P(A), P(B) are simply the prop of outcomes with property A, B.

P(A|B) is the prop of outcomes with A out of outcomes with B and P(B|A) is the prop of outcomes with B out of those with A.

How well did you know this?

Not at all

Perfectly

Provide the form of Bayes’ Theorem when looking at two COMPETING statements of of hypotheses.

P(A|B)
= P(B|A)P(A) /P(B)
= P(B|A)P(A) / P(B|A)P(A) + P(B|A’)P(A’)

where:
P(A) is the PRIOR proba, the INITIAL degree of belief in A.

P(A’) is the corresponding proba of the INITIAL belief AGAINST A, i.e. 1-A = A’

P(B|A) is the LIKELIHOOD or cond. proba, or degree of belief in B GIVEN (A is TRUE).

P(B|A’) is the LIKELIHOOD or cond. proba, or degree of belief in B GIVEN (A is FALSE).

P(A|B) is the POSTERIOR proba, the proba for A AFTER taking into ACCOUNT B FOR and AGAINST A.

How well did you know this?

Not at all

Perfectly

Show on a tree diagram how Bayes’ formula can be interpreted as a TWO STEP stochastic process.

The first stochastic process in the tree involves the events Ak which PARTITION space S (Bayes’ numerator).

The second step involves the arbitrary event E (Bayes’ denominator).

e.g. suppose there are three DISJOINT events partitioning S, A1, A2, A3:

     /  ---- P(A1) --  A1 --------P(E|A1)----------E
   / root ------- P(A2) -- A2 -------P(E|A2) ---------E
   \
     \ ---- P(A3) -- A3 --------P(E|A3) ---------E

If we want P(E), which is TOTAL PROBABILITY, using the tree we obtain:

P(E) = P(E|A1)P(A1) + P(E|A2)P(A2) + P(E|A3)*P(A3)

Furthermore, if we P(Ak|E) for k=1,2,3:

P(Ak|E) = P(Ak^E) /P(E)
= P(E^Ak)P(Ak) /P(E)
= P(E^Ak)P(Ak) / P(E|A1)P(A1) + P(E|A2)P(A2) * P(E|A3)*P(A3)

Notice that the above TWO formulas are simply the LAW of TOTAL PROBABILITY in the denominator and BAYES’ FORMULA in the numerator.

How well did you know this?

Not at all

Perfectly

Suppose a dormitory in a college consists of:

30% freshman, of whom 10% own a car
40% sophs, of whom 20% own a car
20% juniors, of whom 40% own a car
10% seniors, of whom 60% own a car

Let A,B,C,D denote, resp, frosh,sophs,jrs, senrs, and let E denote the set of students owning a car.

(a) . What is the probability of owning a car?
(b) What is the proba of junior who own a car?

Study These Flashcards

30% freshman, of whom 10% own a car
40% sophs, of whom 20% own a car
20% juniors, of whom 40% own a car
10% seniors, of whom 60% own a car

        / ----.30-------A--------.10--------E
      /
    /     ----.40-----B---------.2----------E 
  /     / root --/
  \   \
    \   \ ----.20-----C---------.4----------E 
      \   
        \ ----- .10------D---------.6---------E

(a) By law of total probability:
P(E) = P(E|A)P(A)+P(E|B)P(B)+P(E|C)P(C)

= (.1)(.3)+(.4)(.2)+(.2)(.4)+(.1)(.6)
= .03+.08+.08+.06
= .25

(b) P(junior|own car)
= P(C|E)
= P(E|C)*P(C) /P(E)  by Bayes' theorem (flip terms)
= (.4)(.2) / P(E)
= .08 / .25
= 8/25
= .32

Explain independence of events.

Study These Flashcards

Events A and B in a probability space S are said to be INDEPENDENT if the occurrence of one of them does NOT influence the other.

More specifically, B is independent of A if P(B) is the SAME as P(B|A).

Now suppose we substitute P(B) for P(B|A) in the MULTIPLICATION THEOREM that

P(A^B) = P(B|A)*P(A)

Then this yields:

P(A^B) = P(B|A)P(A) = P(B)P(A)

What is a probability density function?

Study These Flashcards

Let X be a CONTINUOUS rv. Then a PROBABILITY DISTRIBUTION or probability DENSITY FUNCTION (pdf) of X is a function f(x) s.t. any two numbers a and b with a <= b:

P(a <= X <= b) = integral (a,b) f(x) dx

i.e. the probability that X takes on a value in the interval [a,b] is the AREA above this interval and UNDER the graph of the DENSITY function (Probability DISTRIBUTION)

For f(x) to be a legit pdf, it must satisfy two conditions:

f(x) >= 0 for all x
integral (-inf,inf) f(x) dx = area under entire graph of f(x) = 1

What is a cumulative distribution function (cdf)?

Study These Flashcards

The CUMULATIVE DISTRIBUTION FUNCTION F(x) for a continuous rv X is defined for every number x by:

F(x) = P(X <= x) = integral(-inf,x) f(y) dy

For ea x, F(x) is the AREA UNDER the PROBABILITY density curve (pdf) to the LEFT of x.

How do you use a CDF to compute probabilities?

Study These Flashcards

The importance of a cdf, just as for discrete rvs, is that probabilities of various intervals can be computed from a formula for or table of F(x).

Let X be a continuous rv with probability function (proba distribution) pdf f(x) and cdf F(x). Then for any number a,

P(X<=a) = F(a) from cdf defn

P(X>a) = 1 - F(a)

and for any two numbers a and b with a<b></b>

How do you obtain a pdf f(x) from a cdf F(x)?

Study These Flashcards

For X discrete, he pmf is obtained from th cdf by taking the difference between two F(x) values. The continuous analog of a difference is a derivative. The following result is a consequence of the Fundamental Theorem of Calc:

If X is a continuous rv with pdf f(x) and cdf F(x), then at every x at which the derivative F’(x) exists, F’(x) = f(x)

What is the pdf of a normal distribution?

Study These Flashcards

Even when the underlying dist is discrete or not even normally dist’d, the normal curve often gives an excellent approximation.

A continuous rv X is said to have a NORMAL DISTRIBUTION with parameters mu and sigma, where (-inf

What is the standard normal distribution and why do we use it?

Study These Flashcards

To compute the cdf P(a<=X<=b) when X is a NORMAL rv with parameters mu and sigma, we must determine

integral(-inf,inf) f(x; mu,sigma) dx

However, none of the standard integration techniques can be used to eval the above expression.

Instead, for mu=0, sigma=1, we CAN take the integral for cdf and TABLES have been created with these STANDARD NORMAL CDF PROBABILITIES.

The normal dist with parameter values mu=0, sigma=1 is called the STANDARD NORMAL DISTRIBUTION.

A rv having a standard normal dist is called a STANDARD NORMAL RANDOM VARIABLE, denoted as Z.

The pdf of Z is:
f(z; mu=0, sigma=1) = 1/sqrt(2pi) * exp(-z^2 /2)

The graph of f(z;0,1) is called the STANDARD NORMAL (or z) CURVE.

The cdf of Z is:
phi(z) = P(Z<=z) = integral(-inf,z) f(y;0,1) dy

What is phi(z)?

Study These Flashcards

For a STANDARD NORMAL rv Z, phi(z) is the cdf:

F(Z) = P(Z<=z)
= integral(-inf,z) f(y;0,1) dy
= phi(z)

Standard normal tables provide phi(z) = P(Z<=z), the AREA UNDER the standard normal density curve to the LEFT of z.

e.g. P(Z<=1.25) = phi(1.25)

looking at the std normal table, z=1.2 on y-axis and intersection with .05 on the x-axis, we have the cdf of a std normal, phi(1.25) = .8944

Explain how percentiles work for the standard normal curve table.

For any p in [0,1], the standard normal curve table can be used to obtain percentiles of the standard normal distribution. e.g. the 99th percentile of the std normal dist is that VALUE z of the horizontal axis such that the AREA UNDER the z curve to the LEFT of the values is .9900. The std norm table gives for fixed value of z the area under the curve to the left of z, whereas here we have the AREA (.99) and want the value of z. This is the INVERSE problem to P(Z<=z) = ?, so the table is used in inverse fashion: find .99 in the table and locate its x,y coordinates the z value. Here, .9901 lis at the intersection of row marked 2.3 and col mared .03, so the 99th pctile is approx z=2.33, and by symmetry, the first pctile is approx z=-2.33. Thus, 1% and 99% of the data will lie -2.33 and 2.33 standard devs from mean 0.

Explain Z_alpha (critical values).

In stat inference, we will need the values on the horizontal z-axis that capture certain small tail areas under the standard normal curve. z_alpha will denote the value on the z axis for which alpha of the AREA UNDER the z curve lies to the right of z_alpha. e.g. z_.10 captures UPPER TAIL AREA .10 z_.01 captures UPPER TAIL AREA .01 Since alpha of the AREA UNDER the z curve lies to the RIGHT of z_alpha, then 1-alpha of the AREA lies to its LEFT. Thus, z_alpha is the (1-alpha)th percentile of the std normal dist. By symmetry, the area under the std normal curve to the left of -z_alpha is also alpha. z_alpha are usually referred to as z CRITICAL VALUES. ``` the most important z critical values are: 90%ile, alpha=.1, z_alpha=1.28 95%ile, alpha=.05, z_alpha=1.645 97.5%ile, alpha=.025, z_alpha=1.96 99%ile, alpha=.01, z_alpha=2.33 99.5%ile, alpha=.005, alpha=2.58 ```

How do you standardize a normal rv?

When Z~N(mu,sigma^2), probabilities involving are computed by "standardizing". The standardized variable is: (X-mu) /sigma 1. Subtracting by mu shifts the mean from mu to zero. 2. dividing by sigma scales the variables so that the std is 1 rather than sigma. The key idea of the proposition is that by standardizing, any probability involving X can be expressed as a probability involving a STANDARD NORMAL rv Z, so that the std norm cdf table can be utilized.

What is the difference between P(2 Heads in row) and p-value of 2 Heads in a row?

P(2 H in row) = / --- .5 ---- HH / -- .5 -- H / \ --- .5 ---- HT root \ / --- .5 ----- TH \ -- .5 -- T \ --- .5 ----- TT ``` so P(HH) = {HH} / {HH,HT,TH,TT} = 1/4 ``` P-value of two heads is the probability that RANDOM CHANCE generated the [OUTCOME] or something [ELSE AS EQUAL] or [MORE RARE]: p value = the outcome + an outcome as equal + outcome more rare = {HH} / {HH,HT,TH,TT} + {TT} / {HH,HT,TH,TT} + NULL = 1/4 + 1/4 + null = 1/2 thus, P(HH) != pvalue of HH

Probability Flashcards

(28 cards)