discrete probability theory Flashcards

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Probability vocab:

A
  1. Experiment
  2. Sample Space.
  3. event
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

experiment:

A

A procedure that yields one of the given set of possible outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sample space and event:

A

Sample space: set of possible outcomes.

Event: subset of Sample space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Laplace’s definition of the probability of an event with finitely many possible outcomes:

A

If S is a finite nonempty sample space of equally likely outcomes, and E is an event, that
is, a subset of S, then the probability of E is
p(E) = |E|/|S|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

THM:

probability of complement

proof too

A

Let E be an event in a sample space S. The probability of the event E = S − E, the complementary event of E, is given by
p(E) = 1 − p(E).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Probability of Union:

proof too

A

Let E1 and E2 be events in the sample space S. Then

p(E1 ∪ E2) = p(E1) + p(E2) − p(E1 ∩ E2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Probabilistic reasoning

A

A common problem is determining which of two events is more likely.

Think of the monty hall 3 door puzzle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Let S be the sample space of an experiment with a finite or countable number of outcomes. We
assign a probability p(s) to each outcome s. We require that two conditions be met:

A

(i) 0 ≤ p(s) ≤ 1 for each s ∈ S

(ii) ∑ p(s) = 1.
s∈S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What if the sample space is infinite?

A

∑ s∈S p(s) is a convergent infinite series.

Integral Calculus is used to study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Probability Distribution?

A

The function p from the set of all outcomes of the sample space S is called a probability
distribution.

what is p? (I know)

probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Unifrom distribution

probability assignment

A

Suppose that S is a set with n elements. The uniform distribution assigns the probability 1∕n
to each element of S.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The probability of the event E :

Definition 2 using definition 1 :

A

The probability of the event E is the sum of the probabilities of the outcomes in E. That is,
p(E) = ∑ p(s).
s∈E

(Note that when E is an infinite set, ∑
s∈E p(s) is a convergent infinite series.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

selecting an element of S at

random.:

A

The experiment of selecting an
an element from a sample space with a uniform distribution is called selecting an element of S at
random.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Probabilities of Complement and Union :

A

It stays valid, think in terms of Def 2 of sec 7.2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If E1, E2, … is a sequence of pairwise disjoint events in a sample space S, then

p(⋃ Ei) is?
i

A

If E1, E2, … is a sequence of pairwise disjoint events in a sample space S, then
p(⋃ Ei) = ∑i p(Ei).
i

(Note that this theorem applies when the sequence E1, E2, … consists of a finite number or a
countably infinite number of pairwise disjoint events.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Conditional Probability:

A

Let E and F be events with p(F) > 0. The conditional probability of E given F, denoted by
p(E ∣ F), is defined as
p(E ∣ F) = p(E ∩ F) / p(F) .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Independent Events:

A

The events E and F are independent if and only if
p(E ∩ F) = p(E)p(F).

When
two events are independent, the occurrence of one of the events gives no information about the
probability that the other event occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Types of Independences:

A

The events E1, E2, … , En are pairwise independent if and only if p(Ei ∩ Ej) =p(Ei)p(Ej) for all pairs of integers i and j with 1 ≤ i < j ≤ n.

These events are
mutually independent if p(Ei1 ∩ Ei2 ∩ ⋯ ∩ Eim ) = p(Ei1)p(Ei2) ⋯ p(Eim ) whenever
in , j = 1, 2, … , m, are integers with 1 ≤ i1 < i2 < ⋯ < im ≤ n and m ≥ 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Bernoulli trial :

A

Each performance of an experiment with two possible outcomes is
called a Bernoulli trial, after James Bernoulli, who made important contributions to probability
theory. In general, a possible outcome of a Bernoulli trial is called a success or a failure.

If p
is the probability of a success and q is the probability of a failure, it follows that p + q = 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The probability of exactly k successes in n independent Bernoulli trials

THEOREM:

prove it too.

A

The probability of exactly k successes in n independent Bernoulli trials, with probability of
success p and probability of failure q = 1 − p, is
C(n, k)p^(k q)^(n−k).

Considered as a function of k, we
call this function the binomial distribution.
b(k; n, p) = C(n, k)pkqn−k.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

e sum of the probabilities that there are k successes when n independent Bernoulli
trials are carried out, for k = 0, 1, 2, … , n, equals?

A

b(k; n, p) = (p + q)^n = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Random Variable :

A

A random variable is a function from the sample space of an experiment to the set of real
numbers. That is, a random variable assigns a real number to each possible outcome.

Remark: Note that a random variable is a function. It is not a variable, and it is not random!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The distribution of a random variable X on a sample space S is?

Definition 7:

A

The distribution of a random variable X on a sample space S is the set of pairs (r, p(X = r))
for all r ∈ X(S), where p(X = r) is the probability that X takes the value r. (The set of pairs
in this distribution is determined by the probabilities p(X = r) for r ∈ X(S).)

25
Q

The Birthday problem?

solve it too

A

What is the minimum number of people who need to be in a room so that the probability that at least two of them have the same birthday is greater than 1∕2?

26
Q

Probability of a collision in hashing function

A

solve it. page 487

27
Q

Deterministic algorithms:

A

These proceed in same way whenever given the same input.

28
Q

Probabilistic Algorithms:

why use them?

A

These make random choices at one or more steps.

Need: when deterministic algo has huge number (or unknown number) of possible cases.

29
Q

Monte Carlo Algorithms :

A

A class of probabilistic algorithms, for decision problems.

Gives answers, but small probability remains that these might be incorrect, however this probability decreases with more tests.

At each step possible response are true or unknown.
True : no more iterations required, and is true.
Unknown: Ans could be true or false.
ans is false if every iteration yields unknown.
If correct answer is false then algorithm will produce unknown for all iterations.
If correct answer is true, then it will yield true or unknown.

30
Q

Monte carlo

probability of incorrect answer?

A

if the correct answer is
“true,” then the algorithm could answer either “true” or “false,” because it may be possible that
each iteration produced the response “unknown”.

We will show that this possibility becomes extremely unlikely as the number of tests increases.

Suppose that p is the probability that the response of a test is “true,” given that the answer
is “true.” It follows that 1−p is the probability that the response is “unknown,” given that the
answer is “true.” Because the algorithm answers “false” when all n iterations yield the answer
“unknown” and the iterations perform independent tests, the probability of error is (1−p)^n.
When p ≠ 0, this probability approaches 0 as the number of tests increases.

Consequently, the probability that the algorithm answers “true” when the answer is “true” approaches 1.

31
Q

Probabilistic Primality testing

A

Read it.

32
Q

the probabilistic method:

A

THE PROBABILISTIC METHOD If the probability that an element chosen at random
from a S does not have a particular property is less than 1, there exists an element in S with
this property

33
Q

TH 4

A

after revising chapter 4

34
Q

BAYES’ THEOREM

derive it, solve example 1

A

Suppose that E and F are events from a sample space S such that
p(E) ≠ 0 and p(F) ≠ 0. Then
p(F ∣ E) = p(E ∣ F)p(F) / p(E ∣ F)p(F) + p(E ∣ ~F)p(~F)

check book, page 495.

35
Q

GENERALIZED BAYES’ THEOREM

A

Note that in the statement of Bayes’ theorem, the
events F and F are mutually exclusive and cover the entire sample space S (that is, F ∪ F = S).
We can extend Bayes’ theorem to any collection of mutually exclusive events that cover the
entire sample space S, in the following way.

Suppose that E is an event from a sample space
S and that F1, F2, … , Fn are mutually exclusive events such that ⋃n
i=1 Fi = S. Assume that
p(E) ≠ 0 and p(Fi) ≠ 0 for i = 1, 2, … , n. Then
p(Fj ∣ E) = p(E ∣ Fj)p(Fj)∑ni=1 p(E ∣ Fi)p(Fi)

36
Q

Bayesian Spam Filter :

Whats spam?
Vocab?
How to find out which is spam and which is not?

priority?

A

Spam: flood of unwanted and unsolicited messages.

Spam mails might contain words like “offer” and non spam might contain “mom”.

False Negative: spam filters sometimes fail to identify a spam message as spam; this is called a false negative

False positive: And they sometimes identify
a message that is not spam as spam; this is called a false positive.

While filtering it is important to minimize false positives, because filtering out wanted e-mail is much worse than
letting some spam through

37
Q

The probability that the message is spam, how to find it?

A

B: Set of messages known to be spam.
G: Set of messages known not to be spam.

We count the number of messages in the set containing
each word to find nB(w) and nG(w), the number of messages containing the word w in the sets
B and G, respectively.

Then, the empirical probability that a spam message contains the word
w is p(w) = nB(w)∕|B|, and the empirical probability that a message that is not spam contains
the word w is q(w) = nG(w)∕|G|.

Let S be the event
that the message is spam. Let E be the event that the message contains the word w. The events S,
that the message is spam, and S, that the message is not spam, partition the set of all messages.
Hence, by Bayes’ theorem, the probability that the message is spam, given that it contains the
word w, is
p(S ∣ E) = p(E ∣ S)p(S) / p(E ∣ S)p(S) + p(E ∣ ~S)p(~S)

Without prior knowledge we assume that
p(S) = p(S) = 1∕2.

p(E ∣ S), the conditional probability that the message contains the
word w given that the message is spam, by p(w). Similarly, p(E ∣ ~S), = q(w).
p(S ∣ E) can be estimated by
r(w) = p(w) / p(w) + q(w).

If r(w) is greater than a threshold that we set, such as 0.9, then we classify the message
as spam.
38
Q

Ways to enhance spam filter:

A
  1. Detecting spam based on the presence of a single word can lead to excessive false positives and false negatives. Consequently, spam filters look at the presence of multiple words.
  2. The more words we use to estimate the probability that an incoming mail message is spam,
    the better is our chance that we correctly determine whether it is spam.
  3. For the most effective spam filter, we choose words for which the probability that each
    of these words appears in spam is either very high or very low.
  4. look at the probabilities that particular pairs of words appear in spam and in messages that are not spam. We
    then treat appearances of these pairs of words as appearance of a single block.
  5. we can assess the likelihood that a message is spam by examining the structure of a message to determine where words appear in it.
  6. spam filters look at appearances of certain types of strings of characters rather than just words. For example, a message with the valid
    e-mail address of one of your friends is less likely to be spam (if not sent by a worm) than one
    containing an e-mail address that came from a country known to originate a lot of spam.
39
Q

the probability that the message is spam given that it contains both w1 and w2, is?

A
p(S ∣ E1 ∩ E2) = p(E1 ∣ S)p(E2 ∣ S)
p(E1 ∣ S)p(E2 ∣ S) + p(E1 ∣ S)p(E2 ∣ S)
.
We estimate the probability p(S ∣ E1 ∩ E2) by
r(w1, w2) = p(w1)p(w2)
p(w1)p(w2) + q(w1)q(w2)

refer textbook.

40
Q

m the probability that a message containing

all the words w1, w2, … , wk is spam is ?

A
p(S ∣
⋂
k
i=1
Ei
) =
∏k
i=1 p(Ei ∣ S)
∏k
i=1 p(Ei ∣ S) + ∏k
i=1 p(Ei ∣ S)
.
We can estimate this probability by
r(w1, w2, … , wk) =
∏k
i=1 p(wi)
∏k
i=1 p(wi) + ∏k
i=1 q(wi)

Refer textbook.

41
Q

What is Expected Value?

What information does it give?

A

The expected value of a random variable is the sum over all elements in a sample space of the
product of the probability of the element and the value of the random variable at this element.
Consequently, the expected value is a weighted average of the values of a random variable.

It provides a central point for the distribution of values of this random variable.

42
Q

Definition 1 :

Expected value :

A

The expected value also called the expectation or mean, of the random variable X on the
sample space S is equal to
E(X) = ∑ p(s)X(s).
s∈S

The deviation of X at s ∈ S is X(s) − E(X), the difference between the value of X and the
mean of X.

S = {x1, x2, … , xn}, E(X) = ∑n p(xi)X(xi)
i=1

43
Q

When there are infinitely many elements of the sample space, the expectation is?

A

The expectation is defined only when the infinite series in the definition is absolutely convergent. In particular, the expectation of a random variable on an infinite sample space is finite if it exists.

44
Q

THEOREM 1

what if outcomes are large? It’s inconvenient to use Definition 1.

proof:

A

we can find the expected value
of a random variable by grouping together all outcomes assigned the same value by the random
variable
If X is a random variable and p(X = r) is the probability that X = r, so that
p(X = r) = ∑s∈S,X(s)=r p(s) , then
E(X) = ∑r∈X(S) p(X = r)r.

proof :
Suppose that X is a random variable with range X(S), and let p(X = r) be the probability that the random variable X takes the value r. Consequently, p(X = r) is the sum of the
probabilities of the outcomes s such that X(s) = r

45
Q

Thm 2:

Expected number of successes, Bernauli trial..

prove it.

A

The expected number of successes when n mutually independent Bernoulli trials are performed, where p is the probability of success on each trial, is np.

prove in TB.

using lineraity, trials are not necessarily independent proof, example 5.

46
Q

Linearity of Expectations

expected value of sum of random variables.

Thm 3:

prove it:

how to this to solve problems?

A

Expected values are linear.
If Xi, i = 1, 2, … , n with n a positive integer, are random variables on S, and if a and b are
real numbers, then
(i) E(X1 + X2 + ⋯ + Xn) = E(X1) + E(X2) + ⋯ + E(Xn)
(ii) E(aX + b) = aE(X) + b.

for proof, tb.

i is inductive proof.
ii is algebra.

To solve: The key step is to express a random variable whose expectation we
wish to find as the sum of random variables whose expectations are easy to find.

47
Q

Expected number of inversions in a permutation :

A

The ordered pair (i, j) is called an
inversion in a permutation of the first n positive integers if i < j but j precedes i in the
permutation. For instance, there are six inversions in the permutation 3, 5, 1, 4, 2; these
inversions are
(1, 3), (1, 5), (2, 3), (2, 4), (2, 5), (4, 5).

and then example 7 for further solution.

48
Q

Avg case computational complexity interpret it as computing expected value of random variable :

A

Computing the average-case computational complexity of an algorithm can be interpreted as computing the expected value of a random variable. Let the sample space of an experiment be
the set of possible inputs aj
, j = 1, 2, … , n, and let X be the random variable that assigns to aj the
number of operations used by the algorithm when given aj as input.
Based on our knowledge of
the input, we assign a probability p(aj) to each possible input value aj.
Then, the average-case complexity of the algorithm is
E(X) = ∑n p(aj)X(aj)
j=1

49
Q

Geometric Distribution :

A

A random variable X has a geometric distribution with parameter p if p(X = k) =(1 − p)^(k−1)p for k = 1, 2, 3, …, where p is a real number with 0 ≤ p ≤ 1.

50
Q

Where does geometric distribution arise? Application?

A

Geometric distributions arise in many applications because they are used to study the time required before a particular event happens, such as the time required before we find an object with
a certain property, the number of attempts before an experiment succeeds, the number of times
a product can be used before it fails, and so on.

51
Q

Thm 4:
expected value of X who has geometric distribution:

proof:

A

If the random variable X has the geometric distribution with parameter p, then E(X) = 1∕p

Proof: example 10.

52
Q

Definition 3:

Independent random variable :

A

The random variables X and Y on a sample space S are independent if
p(X = r1 and Y = r2) = p(X = r1) ⋅ p(Y = r2),
or in words, if the probability that X = r1 and Y = r2 equals the product of the probabilities
that X = r1 and Y = r2, for all real numbers r1 and r2.

53
Q

THM 5:

If X and Y are independent random variables on a sample space S,

check example 12 too.

then E(XY) =

A

If X and Y are independent random variables on a sample space S, then E(XY) = E(X)E(Y)

proved in tb.

54
Q

Definition 4

Variance :

A

Let X be a random variable on a sample space S. The variance of X, denoted by V(X), is
V(X) = ∑ (X(s) − E(X))^2 p(s).
s∈S
That is, V(X) is the weighted average of the square of the deviation of X. The standard
deviation of X, denoted 𝜎(X), is defined to be √V(X).

55
Q

What information does variance give?

A

The expected value of a random variable tells us its average value, but nothing about how widely its values are distributed.

The variance of a
random variable helps us characterize how widely a random variable is distributed. In particular,
it provides a measure of how widely X is distributed about its expected value.

56
Q

Theorem 6 provides a useful simple expression for the variance of a random variable. prove it.

A

If X is a random variable on a sample space S, then V(X) = E(X^2) − E(X)^2

57
Q

We can use Theorems 3 and 6 to derive an alternative formula for V(X) that provides some
insight into the meaning of the variance of a random variable.
COROLLARY 1
prove it.

A

If X is a random variable on a sample space S and E(X) = 𝜇, then V(X) = E((X − 𝜇)^2).

58
Q

THEOREM 7 BIENAYME’S FORMULA

Prove it:

A

If X and Y are two independent random variables
on a sample space S, then V(X + Y) = V(X) + V(Y). Furthermore, if Xi
, i = 1, 2, … , n,
with n a positive integer, are pairwise independent random variables on S, then
V(X1 + X2 + ⋯ + Xn) = V(X1) + V(X2) + ⋯ + V(Xn).

59
Q

THEOREM 8 CHEBYSHEV’S INEQUALITY

How likely is it that a random variable takes a value far from its expected value?

prove it:

A

provides an upper bound on the
probability that the value of a random variable differs from the expected value of the random
variable by more than a specified amount.

Let X be a random variable on a sample space S with
probability function p. If r is a positive real number, then
p( |X(s) − E(X)| ≥ r ) ≤ V(X) ∕ r^2