Probability and Stats Flashcards
Law of Large Numbers
As sample size grows, sample mean becomes closer to population mean
Bayes Rule
P(A|B) = P(B|A) P(A)/P(B)
Probability vs Likelihood
Probability =
P(X > 32 | mu, std_dev)
Likelihood = finding best params of models/best distribution given observation(s)
e.g., L(mu=sth, std=sth|X)
Bayes Nets
DAG + variables conditioned on others (local conditional probabilities)
e.g., rain -> cricket -> traffic
Bayes Error
e.g.,
Markov Decision Process
Hidden Markov Models
Hidden State (X) = Markov Process
Observable (Y) = only depends on current state of X
Accounts for temporal relations between hidden states and how those states emit observations
e.g., Language modeling, Y = word, X = part of speech
Tall player fell
Hidden Markov Process: P(adj, noun, verb) = P(adj) P(noun | adj) P(verb | noun)
P(“Tall player fell”) = P(Tall | adj) P(player | noun) P(fell | verb)
Full joint distribution from Bayes Net
P(x1, x2, … xn) = prod_i (x_i | parents(x_i))
Decoding in HMM
Find the most probable sequence of hidden states, given a sequence of observations
Confidence Interval
Collect sample means e.g., using bootstrapping (sampling with replacement)
Define a 95% (typical value) interval i.e., interval covering 95% of those sample means
That’s the CI
p-value
H0 = no difference; H1 = different
If p < alpha, then reject H0
alpha = acceptable False Positive Rate
i.e., chances of saying there is a difference, even though there is not in reality
cons: not well calibrated
Student’s t-test
t = (x_bar - mu) / (estimate_of_population_std_dev/sqrt(n))
Test: if the calculated t is outside the confidence interval for given confidence level (1-alpha) and sample size, then reject H_0
Cons: assumes equal variances of both populations
Pros: For smaller sample sizes. t-distribution approaches normal distribution for large sample sizes
Welch’s t-test
Doesn’t assume equal variances. Assumes normal distribution just like student’s t-test
Central Limit Theorem
Binomial distribution
Pr(x | n, p) = C(n, x) p^x (1 - p) ^ (n - x)
x = # of successes (e.g., prefers orange fanta) out of n, given success probability p
p^x (1 - p) ^ (n - x) = proba that ‘x’ events are successful for a given configuration,
C(n, x) total configurations in which x successes out of n are observed
e.g., orange vs grape fanta