1. Fundamentals of Probability Flashcards
Fundamentals of probability for Machine Learning
What is probability?
Probability is a measure of the likelihood that an event will occur, ranging from 0 to 1.
What is a sample space?
The set of all possible outcomes of a probabilistic experiment.
What is an event in probability?
A subset of the sample space, representing one or more outcomes.
What are the three axioms of probability?
- P(A) ≥ 0 for any event A.
- P(S) = 1 where S is the sample space.
- P(A ∪ B) = P(A) + P(B) for disjoint events A and B.
What is conditional probability?
The probability of event A given that event B has occurred, denoted P(A|B).
What is the formula for conditional probability?
P(A|B) = P(A ∩ B) / P(B)
provided P(B) > 0.
What is the law of total probability?
P(A) = Σ P(A|B_i)P(B_i) for a partition {B_i} of the sample space.
What is Bayes’ Theorem?
P(A|B) = [P(B|A) * P(A)] / P(B).
What is an example of Bayes’ Theorem in real life?
Medical testing: Given a positive test result, what is the probability of having the disease?
What does it mean for two events to be independent?
Two events A and B are independent if P(A ∩ B) = P(A)P(B).
What is the difference between independent and dependent events?
Independent events do not affect each other’s probability, whereas dependent events do.
How do you check if events A and B are independent?
Check if P(A|B) = P(A) or if P(A ∩ B) = P(A)P(B).
What is a random variable?
A function that assigns numerical values to outcomes of a probabilistic experiment.
What is the difference between discrete and continuous random variables?
Discrete variables take countable values, while continuous variables take any value in an interval.
What is the expected value (mean) of a random variable?
E[X] = Σ x * P(X=x) for discrete variables or ∫ x f(x) dx for continuous variables.
What is a probability mass function (PMF)?
A function that gives the probability of each possible value for a discrete random variable.
What is a probability density function (PDF)?
A function describing the likelihood of a continuous random variable taking a value.
What is a cumulative distribution function (CDF)?
A function that gives the probability that a random variable is less than or equal to a given value.
Define probability.
A measure of the likelihood of an event occurring, between 0 and 1.
What is a trial in probability?
A single performance or observation of an experiment.
What is an outcome in probability?
A single possible result from a probability experiment.
What is an experiment in probability?
A process that leads to one of several possible outcomes.
What is the sample space of rolling a fair six-sided die?
{1, 2, 3, 4, 5, 6}.
What is an event in probability?
A subset of the sample space.
What are mutually exclusive events?
Events that cannot occur simultaneously (P(A ∩ B) = 0).
What is an example of mutually exclusive events?
Getting heads and tails on a single coin flip.
What is a certain event?
An event that has a probability of 1 (P(A) = 1).
What is an impossible event?
An event that has a probability of 0 (P(A) = 0).
What is the complement rule?
P(A’) = 1 - P(A), where A’ is the complement of A.
What is an exhaustive set of events?
A set of events that covers the entire sample space.
What is conditional probability?
The probability of an event occurring given another event has occurred.
How is conditional probability written?
P(A|B), meaning the probability of A given B.
State the formula for conditional probability.
P(A|B) = P(A ∩ B) / P(B), if P(B) > 0.
What does it mean if P(A|B) = P(A)?
It means that events A and B are independent.
What does the law of total probability state?
P(A) = Σ P(A|B_i)P(B_i) over a partition {B_i}.
How does conditional probability relate to joint probability?
P(A ∩ B) = P(A|B)P(B).
State Bayes’ Theorem.
P(A|B) = [P(B|A) * P(A)] / P(B).
What does Bayes’ Theorem help with?
Updating probabilities based on new evidence.
Give an example of Bayes’ Theorem in machine learning.
Used in Naive Bayes classifiers for text classification.
What is prior probability?
The initial probability of an event before new evidence is introduced.
What is likelihood in Bayes’ Theorem?
P(B|A), the probability of B occurring given A.
What is the posterior probability?
The updated probability of A after considering B.
What are independent events?
Events where P(A ∩ B) = P(A)P(B).
What is the multiplication rule for independent events?
P(A ∩ B) = P(A)P(B).
How can you check if two events are independent?
If P(A|B) = P(A), then A and B are independent.
What are dependent events?
Events where the occurrence of one affects the probability of the other.
Give an example of dependent events.
Drawing two cards from a deck without replacement.
Define a random variable.
A function that assigns numerical values to outcomes of an experiment.
What is a discrete random variable?
A variable that takes a countable number of values.
What is a continuous random variable?
A variable that takes infinitely many values in an interval.
What is the expectation (mean) of a discrete random variable?
E[X] = Σ x * P(X=x).
What is variance?
Var(X) = E[(X - E[X])²], a measure of spread of a random variable.
What is the standard deviation?
The square root of variance, a measure of dispersion.
What is a probability mass function (PMF)?
A function giving the probability of each value for a discrete random variable.
What is a probability density function (PDF)?
A function giving the relative likelihood of a continuous random variable taking a value.
What is a cumulative distribution function (CDF)?
A function giving P(X ≤ x) for a random variable X.
What is the relationship between a CDF and a PDF?
The derivative of a CDF is the PDF.
What is the expected value of a probability distribution?
A weighted average of possible values, using probabilities as weights.
What is the law of large numbers?
As the number of trials increases, the sample mean converges to the expected value.
What is the central limit theorem?
For large n, the sample mean distribution approaches a normal distribution.
What is the difference between a population and a sample?
A population is the entire set of elements, while a sample is a subset.
What is the difference between a parameter and a statistic?
A parameter describes a population; a statistic describes a sample.
What is a probability generating function?
A function that generates the probabilities of a discrete random variable.
What is a moment generating function?
A function that generates the moments (mean, variance, etc.) of a probability distribution.
What is a probability space?
A mathematical triplet (Ω, F, P) where Ω is the sample space, F is the event space, and P is the probability function.
What is a discrete probability distribution?
A probability distribution that deals with countable outcomes.
What is a continuous probability distribution?
A probability distribution that deals with outcomes over a continuous range.
What is the probability of an event A not occurring?
P(A’) = 1 - P(A).
What is the sum of probabilities of all possible outcomes?
Always equals 1.
What is the difference between theoretical and empirical probability?
Theoretical is based on expected outcomes, while empirical is based on actual experiments.
What is the principle of equally likely outcomes?
If all outcomes are equally likely, then P(A) = |A|/|S|.
How does probability relate to frequency?
Probability approximates frequency in a large number of trials.
What does P(A|B) = 0 mean?
Event A never occurs given that B has occurred.
What does P(A|B) = 1 mean?
Event A always occurs given that B has occurred.
What does it mean if P(A|B) > P(A)?
B increases the likelihood of A.
What does it mean if P(A|B) < P(A)?
B decreases the likelihood of A.
What is an application of Bayes’ Theorem in medicine?
Used to calculate the probability of having a disease given a positive test result.
What is the naive Bayes classifier?
A machine learning algorithm that assumes independence between features.
What is the support of a probability distribution?
The set of all values where the probability function is nonzero.
What is Chebyshev’s inequality?
It states that at least (1 - 1/k²) of values lie within k standard deviations of the mean.
What is the moment of a distribution?
A quantitative measure of the shape of a distribution.
What is the probability of a uniform distribution over [a, b]?
1 / (b - a).
What is the mode of a distribution?
The value that appears most frequently.
What is the median of a distribution?
The value that splits the probability distribution into two equal halves.
What is the main consequence of the CLT?
The sample mean approximates a normal distribution as the sample size grows.
Why is CLT important in statistics?
It justifies the use of normal approximations in hypothesis testing and confidence intervals.
What is Jensen’s inequality?
For a convex function g, E[g(X)] ≥ g(E[X]).
What is the covariance of X and Y?
E[(X - E[X])(Y - E[Y])].
What is correlation?
A normalized measure of covariance, ranging from -1 to 1.
What does a correlation of 0 mean?
No linear relationship between X and Y.
What is entropy?
A measure of uncertainty in a probability distribution.
What is Kullback-Leibler (KL) divergence?
A measure of how one probability distribution diverges from another.
What is the relationship between entropy and information gain?
Information gain is the reduction in entropy after observing a variable.