Probabilistic Reasoning Flashcards
sample space
set of possible outcomes
random variable
result of a random experiment
event
any subset of points in a sample space
probability density function
for continuous random variables distribution is expressed implicitly though a prob density function that returns the likelihood of an outcome being close to the given value
probability distribution function
for discrete random variables distributed expressed explicitly through probability distribution function that returns as prob of an outcome being a given value
likelihood
joint density of observed data as function of model parameters
joint distribution
distribution function over 2 or more random variables
P (a or b)
P(a) + P(b) - P(a,b)
P(a and b)
P(a) * P(b)
conditional probability and formula
probability of an event occurring given that another event has already occured
P(a|b)= P(a and b)/ P(b)
what is bayes rule and derive it
P(a|b)= P(b|a)P(a) / P(b)
start with
P(b|a)= P(alb)/P(a) and P(a|b)= P(alb)/ P(b)
what is posterior part
P(cause | effect) it is the probability hypothesis given some evidence
what is likelihood part
P(effect | cause) this is the likelihood hat effect will occur if cause if true
what is prior belief part
P(cause) in top row prior belief in some cause
what is evidence part
P(effect) on bottom is the probability evidence across all possible causes
why is bayes rule helpful?
helps convert problem that’s hard to measure to be computed from something that’s easy to measure
BR is an _____ to
update to prior belief given new info
example of bayes rule classifier
determine if patient has a disease based on + test
P(D|T) = P(T|D)P(D)/ P(T)
joint probability distribution
represents probability of different events occurring together
curse of dimensionality
as number variables increased size of JPD grows exponentially
if have n variables have to consider 2^n combinations
independence
2 events independent if occurrence of one does not affect probability of the other
independence rules
P(A,B) = P(A) * P(B)
formally a and b are independent if P(A|B)= P(A)
independence in terms of conditional probability
knowing that event B happened doesn’t affect probability of A
events are conditionally independent given a third event if…
P(X,Y|Z)= P(X|Z) P(Y|Z)
how naive bayes classifier simplifies computation?
by assuming all features were conditionally independent of each other
each feature independently contributes to the likelihood of the class
decreases complexity of computation by transforming computation into a series of independent likelihood calculations for each feature
naive bayes classifier bias and variance is
low bias
high variance
naive bayes classifier assumption and real life?
assumption not always true in real life as features often correlated/dependent
can lead to suboptimal classifications if there are strong dependencies between features (e.g cough and fever are not independent and may both contribute to the same underlying cause)
how to do spam/ham using NBC
1) calculate priors P(spam) and P(not spam)
2) calculate likelihoods
- for each feature (word) calculate P(word|spam) = count of words in spam/ total count of words in spam
P(word|not spam)
3) use bayes to computer posterior
- for a new email calculate prob of being spam/not spam
P(spam | word1, word2) sideways infinit P(spam) * P(w1|spam) * P(w2|spam) .,…
4) decision