Gaussian Processes Flashcards
What is a Gaussian Process?
A generalization of the multivariate Gaussian distrubution to infintly any variables. Formally, a Gaussian process is a collection of random variables, any finit number of which is Gaussian distributed.
What is the formula for posterior mean in a gaussian process?
m_post() = m() + k(, X)(k(X, X) + sigma^2I)^(-1)(y-m(X))
What is the formula for posterior cov in a gaussian process?
k_post(, ) = k(, ) - k(, X)(k(X, X) + sigma^2I)^(-1)k(X, *)
How can we create new covariance function?
If k1, k2 are covariance funcitons and u(x) is a transform of the input space, then
1) k1 + k2
2) k1*k2
3) k1(u(x), u(x’))
Are also covariance functions
Name some parameters of the GP.
Parameters of the mean and covariance function and the noise variance.
How can we choose hyperparameters?
Maximize the marginal liklehood with f integrated out (Called Maximum liklehood type II).
What three local optima do we often get when minimizing the marginal liklehood, especially with little data?
- High noise variance, long length scale (almost linear)
- Medimum noise, medium length scale
- Low noise, low length scale (Higly non-linear)
What do we need to fully specify a Gaussian process?
Mean and covariance function
Why don’t we use maximum liklehood or MAP to find hyperparameters?
This will lead to overfitting as it is possible to set f(X) = y and letting the noise go to 0. The marginal likelihood does not fit function values, but integrates them out so overfitting can’t happen in the same way.
What properties does a covariance (kernel) function have?
Symetric and postive semi-definit.
How do Gaussian processes scale in the training points with respect to training, prediction andd memory requirement?
O(N^3), O(N^2), O(ND + N^2)