ch2 Flashcards
state space
the set of values which a process can take
probability of a union pf two events
p(A) + p(B) - p(A and B)
product rule
probsbility of the joint event A and B is
p(A, B) = p(A and B) = p(A|B)p(B)
sum rule, law of total probability
p(A) = sumOverB( p(A,B) ) = sumOverB( p(A|B = b)p(B=b)
where we are summing over all possible stqtes of B
marginal distribution
gives the probabilities of various values of the variables in the subset without reference to the values of the other variables.
of a subset of a collection of random variables is the probability distribution of the variables contained in the subset
chain rule of probability
permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities with successive applications of the law of total probability and product rule
with four variables, chain rule produces this:
P(a, b, c, d) = P(a | b, c, d) * p(b | c, d) * p(c | d) * p(d)
conditional probability
p(A|B) = p(a,b)/p(b)
Bayes rule
P(X = x | Y = y) = p(X = x, Y = y)/p(Y=y) = [ p(X=x)p(Y = y | X = x) ] / sumOverX[ p(X = x)p(Y = y | X = x)
Sensitivity
aka the true positive way, the recall, the probability of detection
Measure sthe proportion of positives that are correctly identified as such
The probability that a test will be positive when it is supposed to be positive
Base rate fallacy
If presented with related base rate information (i.e. generic, general information) and specific information (information pertaining only to a certain case), the mind tends to ignore the former and focus on the latter.
Generative classifier
Classifier that specifies how to generate the data using the class-conditional density p(x | y = c) and the class prior p(y = c)
Discriminative classifier
Classifer that directly fits the class posterior p(y = c | x). In contrast to generative models, which are models for generating all values of a phenomenon, both those that can be observed in the world and target variables that can only be computed from those observed, discriminative classifiers provide a model ONLY for the target variables.
In simple terms, discriminative models infer outputs based on inputs, while discrminative models generate both inputs and outputs.
Unconditional or marginal independence
Two events X and Y if p(X, Y) = p(x)p(Y)
Conditional independence (CI)
X and Y are conditionally independent given Z iff the conditional joint can be written as a product of conditional marginals
p(X,Y | Z) = p(X|Z)p(Y|Z)
Cumulative distribution function (cdf)
the probability that X will take a value less than or equal to x
Probability density function (pdf)
a function, whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample
Variance
Variance, measure of the spread of a distribution
Standard deviation
Square root of the variance, useful since it has the same units as X itself
Binomial distribution
with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own boolean-valued outcome: a random variable containing single bit of information: success/yes/true/one (with probability p) or failure/no/false/zero (with probability q = 1 − p)