Test Flashcards
What is a Gaussian Process (GP)?
An (infinite) set of random variables indexed by some set X
For each x ∈ X, there’s a random variable f such that for all A ⊆ X, A = {x1,…,xm}, f_A ~ N(μ_A, K_A).
What does the likelihood function P(data | f) represent?
The probability of observing the data given the function f.
What is the posterior distribution P(f | data)?
The probability of the function f given the observed data.
What are the components of a Gaussian Process?
- Mean function (µ)
- Covariance (kernel) function (k)
What is the formula for the predictive distribution in Gaussian Processes?
f | x1,…,xm, y1,…,ym = GP(f; μ0, k0)
What is the closed form formula for prediction in Gaussian Processes?
μ* = μ(x) + k,A K^-1(yA - μA)
k = k(x, x’) - k,A K^-1 kA,
What is the purpose of optimizing kernel parameters in Gaussian Processes?
To improve predictive performance.
What is one method for optimizing hyperparameters in Gaussian Processes?
Cross-validation on predictive performance.
What is the Bayesian perspective on optimizing hyperparameters?
Maximize the marginal likelihood of the data.
What does maximizing marginal likelihood help with in Gaussian Processes?
It helps guard against overfitting.
What is an Empirical Bayes method?
Estimating a prior distribution from data by maximizing marginal likelihood.
What computational cost is associated with prediction using Gaussian Processes?
Θ(n^3) due to solving linear systems.
What are some basic approaches for accelerating Gaussian Process computations?
- Exploiting parallelism (GPU computations)
- Local GP methods
- Kernel function approximations
- Inducing point methods
True or False: The posterior covariance k’ in Gaussian Processes depends on the observed data yA.
False.
Fill in the blank: The covariance function in Gaussian Processes is also known as the ______.
[kernel function]
What is the effect of kernel parameters in Gaussian Processes?
They influence the shape and smoothness of the function being modeled.
What is the significance of the mean function in a Gaussian Process?
It represents the expected value of the function at each point.
What do inducing point methods in Gaussian Processes do?
They reduce the computational complexity by approximating the full GP model.
What is a common kernel function used in Gaussian Processes?
Squared exponential (Gaussian/RBF) kernel.
What is the relationship between Gaussian Processes and Bayesian linear regression in terms of computational complexity?
GP requires Θ(n^3), while Bayesian linear regression requires Θ(nd^2).
What is the computational cost of Bayesian linear regression?
𝑂(𝑛 𝑚^2 + 𝑚^3) instead of 𝑂(𝑛^3)
This refers to the cost of using a low-dimensional feature map to approximate the true kernel function.