Hypergeom_to_categ Flashcards

Question

Hypergeometric Distribution

Answer 1

A

The Hypergeometric distribution describes the number of successes in a sample drawn from a finite population without replacement. It’s used when calculating the probability of obtaining specific successes in a sample without returning items to the population. In data analysis and machine learning, it’s useful for scenarios like assessing sample quality or validating model performance using limited data.

Answer 2

A

The Hypergeometric PMF calculates the probability of getting exactly k successes in a sample of size n. Formula: P(X=k)=(nN)(kK)⋅(n−kN−K)

Answer 3

A

The Hypergeometric CDF gives the probability of getting k or fewer successes in a sample of size n. Formula: P(X≤k)=∑i=0kP(X=i)

Answer 4

A

Imagine you have a batch of 1000 items, with 150 defective ones. If you randomly select 20 items, what’s the probability of getting exactly 3 defective items?

Answer 5

A

Using the Hypergeometric PMF formula and calculating binomial coefficients: P(X=3)=(201000)(3150)⋅(20−31000−150)

Answer 6

A

In data engineering, it’s useful for assessing sample quality or validating data subsets. In machine learning, it helps estimate the likelihood of observing specific outcomes in model evaluations, especially with limited data.

Answer 7

A

The Negative Binomial distribution models the trials needed for r successes in independent Bernoulli trials. Unlike the Geometric distribution, it considers total trials for r successes, not just the first success.

Answer 8

A

Negative Binomial PMF gives r successes on k-th trial. Formula: P(X=k)=(r−1k−1)⋅pr⋅(1−p)k−r

Answer 9

A

Negative Binomial CDF gives prob. of k or fewer trials for r successes. Formula: P(X≤k)=∑i=rkP(X=i)

Answer 10

A

Imagine flipping a biased coin until 5 heads. Each flip has 0.3 chance of heads. What’s the prob. of taking 10 flips for 5 heads?

Answer 11

A

Using Negative Binomial PMF: P(X=10)=(5−110−1)⋅0.35⋅(1−0.3)10−5

Answer 12

A

It models attempts needed for a specific success count, like conversions. In ML, it might estimate iterations for training to reach a performance level.

Answer 13

A

The Discrete Uniform distribution is a probability distribution where all outcomes are equally likely within a finite set of values. It’s encountered in scenarios where each outcome has the same probability, without any bias towards specific values.

Answer 14

A

The Discrete Uniform PMF assigns an equal probability to each possible outcome. Formula: P(X=x)=n1 where X is the outcome, x is a specific value, and n is the total number of outcomes.

Answer 15

A

The Discrete Uniform CDF gives the probability that the outcome is less than or equal to a specific value. It’s a step function increasing by n1 at each outcome.

Answer 16

Study These Flashcards

A

Consider rolling a fair six-sided die. What’s the probability of rolling a 3?

Answer 17

Study These Flashcards

A

Using the Discrete Uniform PMF formula: P(X=3)=61

Answer 18

Study These Flashcards

A

In data engineering, it’s useful for equally likely outcomes, like generating random test data. In machine learning, it might be used in simulations or to create synthetic datasets with uniform characteristics.

Answer 19

Study These Flashcards

A

The Categorical distribution represents probabilities of outcomes in a discrete set of categories. It’s used with categorical data, such as survey responses, where each category has an associated probability.

Answer 20

Study These Flashcards

A

The Categorical PMF gives the probability of each category in the set. Formula: P(X=xi)=pi where X is the categorical variable, xi is a specific category, and pi is the probability associated with category xi.

Answer 21

Study These Flashcards

A

The Categorical CDF gives the cumulative probability that the outcome is less than or equal to a specific category. It’s a step function that accumulates the probabilities of each category up to the desired category.

Answer 22

Study These Flashcards

A

Consider a survey where participants choose their favorite fruit: Apple, Banana, or Orange. Probabilities: Apple (0.4), Banana (0.3), Orange (0.3).

Answer 23

Study These Flashcards

A

Using the Categorical PMF formula: P(X=”Banana”)=0.3

Answer 24

Study These Flashcards

A

In data engineering, it models and analyzes categorical data like survey responses. In machine learning, it’s applied in scenarios with discrete categories, such as sentiment analysis or document categorization.

Hypergeom_to_categ Flashcards

(29 cards)