Week 2 Flashcards
What do we call the Gaussian distribution if we generalise it to define a density function over continuous vectors?
Multivariate Gaussian distribution.
How do we define the multivariate Gaussian density for a vector x = [x.1, …, x.D]^T ?
p(x) = (1 / ( (2*pi)^(D/2) |SIG| ^0.5)) * exp { -0.5 * (x-mu)^T * SIG^-1 * (x-mu) }
What is the mu in the formula for the multivariate Gaussian density?
The mean, a vector of the same size as vector x.
What does the d-th element of mu tell us in the formula for multivariate Gaussian density?
The mean value of x.d
What is the form of the variance, the SIG, in the formula for multivariate Gaussian density?
a DxD covariance matrix
I^-1 ==
(identity matrix)
I
I * matrix ==
(identity matrix I)
matrix
The exp of a sum gives the same result as…
the product of exps
|A| is the … of matrix A
determinant
How do you calculate the determinant of a 2x2 matrix A:
[ a b
c d ]
|A| = ad - bc
What is |I|?
1
(2*pi)^(D/2) can be written as…
PRODUCT(d=1 to D) of (2*pi)^(1/2)
What is Tr(A) for matrix A?
the trace of a square matrix A, the sum of the diagonal elements of A
If A = I.D, so the DxD identity matrix, then Tr(I.D) =
SUM(d=1 to D) 1 = D
Tr(AB) ==
Tr(BA)
Tr(w^T * w) ==
w^T * w,
since w^T * w results in a scalar.
What does the binomial distribution describe?
The probability of a certain number of successes in N binary events.
How do you calculate the probability of y successes in N tosses with probability r per toss and a binary event?
P(Y = y ) = (N over y) * r^y * (1-r)^(N-y)
When is a likelihood-prior pair conjugate?
If it results in a posterior of the same form as the prior.
Suppose we have data generated by a model with distribution t ~N(X.w, sig^2). What does this say about X.w?
X.w is the mean vector, it gives the t we should get for every x-element. Every element in X.w is a Gaussian random variable, but X.w as a whole is a multivariate Gaussian.
What can you do in the equation
E[w^] = E[ (X^T * X) ^-1 * X^T * t] if it’s a linear function?
You can change the order in which you take the expectation and apply the linear function, so:
= (X^T * X) ^-1 * X^T * E[t].
What is the expectation of a multivariate normal distribution?
The mean.
Why do (X^T * X)^-1 and (X^T * X) cancel each other out?
They are inverses of each oter.
When do we call an estimator x unbiased?
When it has the property that
E[x^] = x.
cov[w^] ==
sig^2 * (X^T * X) ^-1
The cov[w^] matrix is the inverse of…
the Hessian matrix of 2nd partial derivatives.
What does a large covariance mean in the cov[w^] matrix?
That we’re uncertain.
In linear regression, if there is a negative covariance between w.0^and w.1^, then
there is a dependence such that if one goes up, the other goes down.
P(Y.N = y.N | R=r) ==
(N over y.N) * r^y.N * ((1-r)^(N-y.N))
What do we need to compute the joint distribution of r and y p(r,y.N)?
P(Y.N = y.N | R=r) and p(r). So the conditional distribution of Y.N given R and the density of r, the prior.
p(r|Y.N = y.N ) ==
p(r,y.N) / P(Y.N = y.N)
p(r, y.N)
P(Y.n=y.N | R=r) * p(r)