Final Flashcards
Probability
The measure of the likelihood that an event will occur
Sample Space
The set of all possible outcomes of an experiment
Event
A subset of the sample space
Compliment
Denoted as A’ or A^c; all outcomes not in A.
Addition Rule
P(A∪B)=P(A)+P(B)−P(A∩B)
Multiplication Rule (for Independent Events)
P(A∩B)=P(A)⋅P(B)
Conditional Probability
P(A|B)= P(A∩B) / P(B)
Probability Mass Function (PMF)
Gives the probability of each possible value in a discrete random variable.
Probability Density Function (PDF)
Gives the probability density of a continuous random variable.
Expected Value (μ)
μ=∑(i=1, n) xi * P(X=xi)
Mean (X^-)
(X^-) = [∑(i=1, n) xi] / n
Variance (Var(x))
Var(x) = [∑(i=1, n) (xi-[X^-])^2] / n
Standard Deviation (σ)
σ = SQRT (Var(x))
Summation Notation
∑(i=1, n) xi
“The sum of xi from i=1 to n.”
Factorial Notation
n! = the product of all positive integers up to n
EX: 3! = 321 = 6
Central Limit Theorem (CLT)
States that the distribution of the sum (or average) of a large number of independent, identically distributed random variables (typically n > 30) approaches a normal distribution, regardless of the original distribution
Conditions for CLT
The random variables must be independent.
The sample size should be sufficiently large.
The original distribution’s shape doesn’t matter.
CLT Formula for Sample Means
If X is a random variable with mean μ and standard deviation
σ, then the distribution of the sample mean (X^-) approaches a normal distribution with a mean, μ, and standard deviation, σ/SQRT(n)
Maximum Likelihood Estimation (MLE)
A method for estimating the parameters of a statistical model that maximizes the likelihood function
Likelihood Function
L(θ|X)=P(X|L) where:
L is the likelihood function
θ is the parameter
X is the data.
Log-Likelihood Function
ℓ(θ)=ln(L(θ|X)); often used for easier calculations
Set dℓ/dθ =0 and solve for the parameter θ.
Confidence Interval (CI)
A range of values constructed from sample data so that the population parameter is likely to occur within that range at a certain level of confidence
Confidence Level
The probability that the interval contains the true parameter
Common choices are 90%, 95%, and 99%
CI Formula for a Mean
(X^ˉ) ± Z⋅σ/SQRT(n) where:
Z is the Z-score corresponding to the desired confidence level.
Hypothesis Testing Steps
State the null hypothesis (H0) and alternative hypotheses (H1).
Choose the significance level (α).
Calculate the test statistic.
Make a decision: reject or fail to reject the null hypothesis.
Type I Error
Rejecting a true null hypothesis (false positive)
Type II Error
Failing to reject a false null hypothesis (false negative)
P-Value for Z TEST
1 Sided: P(Z > z) or P(Z < z) as appropriate.
2 Sided: P(|Z| > |z|) where z is the calculated Z-value.
P-Value for T TEST
1 Sided: P(t > t_observed) or P(t < t_observed) as appropriate.
2 Sided: P(|t| > |t_observed|) where t_observed is the calculated t-value.
CLT for Sample Proportions
If X is the number of successes in a sample of size, n, from a population with proportion, p, the distribution of
(^p) (sample proportion) approaches a normal distribution with mean, p and standard deviation SQRT [(p*(1-p)) / n].
EX: Z = (^p)-p / (SQRT [(p*(1-p)) / n])
MLE for Normal Distribution (Known Variance)
If X1, X2,…, Xn are independent and identically distributed (i.i.d.) random variables from a normal distribution with mean, μ, and known variance, σ^2, the MLE for μ is the sample mean, (X̄)
CI for Population Proportion
(^p) ± Z * SQRT [(^p * (1 - ^p)) / n
Test Statistic for Population Mean (Known Variance) (Z TEST)
For testing a hypothesis about the population mean (μ) with known variance (σ^2), the test statistic is: Z = (X̄ - μ₀) / (σ / √n), where:
X̄ is the sample mean,
μ₀ is the hypothesized population mean.
Z-Score (Z)
The Z-score is determined by the desired level of confidence. You can find this value in a Z-table or use statistical software.
EX: If you’re constructing a 95% confidence interval, use a Z-score corresponding to the critical value for a 95% confidence interval.
Test Statistic for Population Mean (Unknown Variance) (T TEST)
t = (X̄ - μ₀) / (s / √n)
Degrees of Freedom (df)
df = n - 1 where:
n is the sample size