ch2 Flashcards

1
Q

state space

A

the set of values which a process can take

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

probability of a union pf two events

A

p(A) + p(B) - p(A and B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

product rule

A

probsbility of the joint event A and B is

p(A, B) = p(A and B) = p(A|B)p(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sum rule, law of total probability

A

p(A) = sumOverB( p(A,B) ) = sumOverB( p(A|B = b)p(B=b)

where we are summing over all possible stqtes of B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

marginal distribution

A

gives the probabilities of various values of the variables in the subset without reference to the values of the other variables.

of a subset of a collection of random variables is the probability distribution of the variables contained in the subset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

chain rule of probability

A

permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities with successive applications of the law of total probability and product rule

with four variables, chain rule produces this:

P(a, b, c, d) = P(a | b, c, d) * p(b | c, d) * p(c | d) * p(d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

conditional probability

A

p(A|B) = p(a,b)/p(b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bayes rule

A

P(X = x | Y = y) = p(X = x, Y = y)/p(Y=y) = [ p(X=x)p(Y = y | X = x) ] / sumOverX[ p(X = x)p(Y = y | X = x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sensitivity

A

aka the true positive way, the recall, the probability of detection

Measure sthe proportion of positives that are correctly identified as such

The probability that a test will be positive when it is supposed to be positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Base rate fallacy

A

If presented with related base rate information (i.e. generic, general information) and specific information (information pertaining only to a certain case), the mind tends to ignore the former and focus on the latter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Generative classifier

A

Classifier that specifies how to generate the data using the class-conditional density p(x | y = c) and the class prior p(y = c)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Discriminative classifier

A

Classifer that directly fits the class posterior p(y = c | x). In contrast to generative models, which are models for generating all values of a phenomenon, both those that can be observed in the world and target variables that can only be computed from those observed, discriminative classifiers provide a model ONLY for the target variables.

In simple terms, discriminative models infer outputs based on inputs, while discrminative models generate both inputs and outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Unconditional or marginal independence

A

Two events X and Y if p(X, Y) = p(x)p(Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Conditional independence (CI)

A

X and Y are conditionally independent given Z iff the conditional joint can be written as a product of conditional marginals

p(X,Y | Z) = p(X|Z)p(Y|Z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cumulative distribution function (cdf)

A

the probability that X will take a value less than or equal to x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Probability density function (pdf)

A

a function, whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Variance

A

Variance, measure of the spread of a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Standard deviation

A

Square root of the variance, useful since it has the same units as X itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Binomial distribution

A

with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own boolean-valued outcome: a random variable containing single bit of information: success/yes/true/one (with probability p) or failure/no/false/zero (with probability q = 1 − p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Tail area probabilities

A

The probability that a random variable deviates by a given amount from its expectation

21
Q

Variance

A

Measure of the spread of a distribution, denoted by sigma^2. Defined as

Var[X] = E[(X-population mean)^2]

22
Q

Standard deviation

A

Derivation from variance: std[X] = sqrt(var[X]) that’s useful because it has the same units as X itself

23
Q

Binomial distribution

A

The discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes-no question and each with its own boolean-valued outcome

Pmf: p^k(1-p)^(n-k)

24
Q

Bernoulli distribution

A

Probability distribution of a random experiment w/ exactly two possible outcomes iin which the probability of success is the same every time an experiment is conducted

25
Q

Multinomial distribution

A

Variation of binomial distribution involving more than two outcomes.

PMF is [(n!)/(x1! X2! … xk!)]p1^(x1)…pk^(xk) for k possible outcomes, n events, x is the number of times outcome k occurs

26
Q

Poisson distribution

A

Poi(x | k) = e ^(-k) [(k^x)/(x!)]

First term is normalization constant ensuring distribution sums to 1.

Expresses the probability of a given number of events occurring in a fixed interval of time/space if 1) these events occur with a known constant rate and 2) independently of the time since the last event.

27
Q

Empirical distribution

A

Fn(t) = number of elements in sample <= t / n

28
Q

Dirac measure

A

Assigning a size to a set based solely on whether it contains a fixed element x or not.

29
Q

Gaussian distribution

A

Use because of the central limit theorem (stats that the averages of samples of observations of random variables independently drawn from independent distributions converge in distribution to the normal. Physical quantities that are expected to be the sum of many independent processes (eg measurement errors) thus often have distributions that are nearly normal.

Probability density is [1/sqrt(2pivariance)]e^[(x-populationmean)^2/(2variance)]

30
Q

Precision of a gaussian

A

The inverse variance of a guassian, 1/variance. A high precision means a narrow distribution centered on the population mean.

31
Q

Error function

A

Special function of sigmoid shape that describes diffusion. erf(x) = 1/sqrt(pi) * integral from x to 0 of e^(-t^2)

For nonnegative values of x, the error function has the following interpretation: for a random variable Y that is normally distributed with mean 0 and variance 1/2, erf(x) describes the probability of Y falling in the range [-x, x]].

32
Q

dirac delta function

A

formed in the limit that variance -> 0 where the gaussian becomes an infinitely tall and infinitely then “spike” centered at the mean

has the sifting property which selects out a single term from a sum or integraton since the integrand is only non-zero if x - mean =0

33
Q

student’s t distribution

A

used since gaussians are more sensitive to outliers as their log probability only decays quadratically w distance from the center

[1 + (1/v)((x-u)/(o))^2]^-(v+1/2)

u is mean, o^2 is scale parameter, v is degrees of freedom. variance is actually (vo^2)/(v-2)

34
Q

cauchy/lorentz distribution

A

t-distribution with degree of freedom 1. has such a heavy tail that the integral defining the mean doesnt converge

35
Q

laplace distribution

A

another distribution with a heavy tail (low sensitivity to outliers), aka the double-sided exponential distribution

(1/2b)*exp(-|x-mean|/b)

mean is a location parameter and b > 0 is a scale parameter.

mean and mode are both u; variance is 2b^2

puts more density st zero than guadsian, useful for encouraging sparsity in a model (?)

36
Q

exponential distribution

A

special case of gamma distribution Ga(x | 1, #) where 1 is the shape ans # is the rate parameter. Describes the ti,es betweem events in a Poisson process (ie a process in which events occur continuously and independently at the constant average rate #)

37
Q

chi-squared distribution

A

specia case of gamma distribution Ga(x | v/2, 1/2). Distribution of the sum of squared gaussian random variables.

38
Q

erlang distribution

A

special case same as the gamma distribution where shape (a) is an integer, usually fixed at 2, yielding = Ga(x |2, #) where # is the rate parameter.

Events that occur independently with some average rate are modeled with a Poisson process. The waiting times between k occurrences of the event are Erlang distributed. (The related question of the number of events in a given amount of time is described by the Poisson distribution.)

39
Q

beta distribution

A

a family of continuous probaiblity distributions dfeined on the interval [0,1] parametrized by two positive shape parameters, denoted by alpha and beta that appear as xponents of the random variable and control the shape of the distribution

has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines

Beta(alpha,beta) = [x^(alpha-1)(1-x)^(beta-1)]/Gamma(A)Gamma(B)

Where Gamma is the gamma function

40
Q

gamma distribution

A

a flexible distribution for positive real valued rvs, x > 0

Defined in terms of two paramets shape a>0 and rate b>0. Ga(T|shape =a, rate = b) = [(b^a)/Ga(a)] * [T^(a-1)] * e ^(-Tb)

where Ga is the gamma function

41
Q

gamma function

A

integral from 0 to infinity over u^(x-1)e^(-u) with respect to u

an extension of the factorial function, with its argument shifted down by 1, to real and complex numbers

42
Q

pareto distribution

A

used to model the distribution of quantities that exhibit long tails/heavy tails. For example, word frequencies in english follow Zipf’s law. Wealth is similar skewed, esp in plutocracies like the US.

pfm is k * m^k * x^-(k+1) * I(x >= m(

43
Q

Zipf’s law

A

Zipf’s law states that given a large sample of words used, the frequency of any word is inversely proportional to its rank in the frequency table

44
Q

covariance

A

measurement of the degree to which X and Y are (linearly) related

cov[X,Y] = E[XY] - E[X]E[Y] or E[(X-E[X])(Y-E(Y)]

can be between 0 and infinity

45
Q

(pearson) correlation coefficient

A

cov[X,Y]/sqrt(var(X)var(Y)). A normalized measure with a finite upper bound. Corr[X,Y] is 1 iff Y = aX + b and there’s a linear relationship between X and Y.

not related to the slope of the regression line, which is actually cov[X,Y]/var[X]

correlation implies dependence, but noncorrelation does not imply independence (another relationship might hold)

46
Q

multivariate gaussian/normal (MVN)

A

most widely used joint probability density function for continuous variables; covered more in ch4

47
Q

linearity of expectation

A

the property that the expected value of the sum of random variables is equal to the sum of their individual expected values, regardless of whether they are independent

48
Q

linear transformation of random variable

A

y = f(x) = Ax + b

E[y] = E[Ax+b] = A(mu) + b where mu = E[x}.

cov[y] = cov[Ax+b] = A(cov[X])(transpose of A)

Mean and covariance only define the distribution of y if x is Gaussian.