Test Flashcards

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Gaussian Process (GP)?

A

An (infinite) set of random variables indexed by some set X

For each x ∈ X, there’s a random variable f such that for all A ⊆ X, A = {x1,…,xm}, f_A ~ N(μ_A, K_A).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the likelihood function P(data | f) represent?

A

The probability of observing the data given the function f.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the posterior distribution P(f | data)?

A

The probability of the function f given the observed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the components of a Gaussian Process?

A
  • Mean function (µ)
  • Covariance (kernel) function (k)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the formula for the predictive distribution in Gaussian Processes?

A

f | x1,…,xm, y1,…,ym = GP(f; μ0, k0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the closed form formula for prediction in Gaussian Processes?

A

μ* = μ(x) + k,A K^-1(yA - μA)
k
= k(x, x’) - k,A K^-1 kA,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of optimizing kernel parameters in Gaussian Processes?

A

To improve predictive performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is one method for optimizing hyperparameters in Gaussian Processes?

A

Cross-validation on predictive performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Bayesian perspective on optimizing hyperparameters?

A

Maximize the marginal likelihood of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does maximizing marginal likelihood help with in Gaussian Processes?

A

It helps guard against overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an Empirical Bayes method?

A

Estimating a prior distribution from data by maximizing marginal likelihood.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What computational cost is associated with prediction using Gaussian Processes?

A

Θ(n^3) due to solving linear systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some basic approaches for accelerating Gaussian Process computations?

A
  • Exploiting parallelism (GPU computations)
  • Local GP methods
  • Kernel function approximations
  • Inducing point methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True or False: The posterior covariance k’ in Gaussian Processes depends on the observed data yA.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Fill in the blank: The covariance function in Gaussian Processes is also known as the ______.

A

[kernel function]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the effect of kernel parameters in Gaussian Processes?

A

They influence the shape and smoothness of the function being modeled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the significance of the mean function in a Gaussian Process?

A

It represents the expected value of the function at each point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What do inducing point methods in Gaussian Processes do?

A

They reduce the computational complexity by approximating the full GP model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a common kernel function used in Gaussian Processes?

A

Squared exponential (Gaussian/RBF) kernel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the relationship between Gaussian Processes and Bayesian linear regression in terms of computational complexity?

A

GP requires Θ(n^3), while Bayesian linear regression requires Θ(nd^2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the computational cost of Bayesian linear regression?

A

𝑂(𝑛 𝑚^2 + 𝑚^3) instead of 𝑂(𝑛^3)

This refers to the cost of using a low-dimensional feature map to approximate the true kernel function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the basic approaches for accelerating Gaussian Processes (GP)?

A
  • Exploiting parallelism (GPU computations)
  • Local GP methods
  • Kernel function approximations (RFFs, QFFs, …)
  • Inducing point methods (SoR, FITC, VFE etc.)
24
Q

True or False: Fast GP methods do not address the cubic scaling in n.

A

True

Fast GP methods yield substantial speedup but still face cubic scaling issues.

25
Q

What is a shift-invariant kernel?

A

A kernel 𝑘(𝑥, 𝑥’) is called shift-invariant if 𝑘(𝑥, 𝑥’) = 𝑘(𝑥 − 𝑥’)

26
Q

What is the Fourier transform of a shift-invariant kernel?

A

𝑘(𝑥 − 𝑥’) = K(𝑝(𝜔)𝑒^{-𝜎𝑥’})

This relates the kernel to its frequency representation.

27
Q

What is the key idea behind Random Fourier Features?

A

Interpret kernel as expectation: 𝑘(𝑥 - 𝑥’) = E[cos(𝜔𝑥 + 𝑏)cos(𝜔𝑥’)]

28
Q

What is the performance theorem for Random Fourier Features?

A

For compact subset M with diameter diam(M), the probability bound holds: Pr[sup |𝑧(𝑥) − 𝑘(𝑥, 𝑥’)| ≥ 𝜖] ≤ 2d exp(-diam(M)^2 / (𝜖^2 * (d + 2)))

29
Q

What is the primary function of inducing point methods?

A

Summarize data via function values of f at a set of m inducing points.

30
Q

Fill in the blank: A kernel 𝑘(𝑥, 𝑥’) for 𝑥, 𝑥’ ∈ ℝ+ is called ______ if 𝑘(𝑥, 𝑥’) = 𝑘(𝑥 − 𝑥’).

A

shift-invariant

31
Q

What are examples of stationary kernels?

A
  • Gaussian
  • Exponential
  • Cauchy
32
Q

True or False: The Gaussian kernel has the Fourier transform that is the standard Gaussian distribution in d dimensions.

A

True

33
Q

What do local GP methods exploit?

A

Covariance functions that decay with distance of points.

34
Q

What is the outcome of applying Bayesian linear regression with explicit feature maps?

A

It approximates Gaussian Processes.

35
Q

What is the computational cost of Gaussian Processes inference?

A

Requires solving linear systems.

36
Q

What modern GP libraries implement parallel GP inference?

A
  • GPflow
  • GPyTorch
37
Q

What is the main idea behind kernel function approximation?

A

Construct explicit low-dimensional feature map that approximates the true kernel function.

38
Q

What is the relationship between inducing points and data summarization?

A

Inducing points help to summarize data by function values at a set of inducing points.

39
Q

What does the theorem by Bochner state regarding shift-invariant kernels?

A

A shift-invariant kernel is positive definite if and only if p(𝜔) is nonnegative.

40
Q

What does SoR stand for in the context of Gaussian processes?

A

Subset of Regressors

41
Q

What does the SoR approximation replace in the Gaussian process training conditional?

A

𝑝 𝒇 𝒖 = 𝑁𝑲𝒇,𝒖𝑲’𝒖,𝟏𝒖𝒖, 𝑲𝒇,𝒇 − 𝑸𝒇,𝒇

42
Q

What is the resulting model from the SoR approximation?

A

A degenerate GP with covariance function

43
Q

What does FITC stand for in Gaussian processes?

A

Fully independent training conditional

44
Q

What does the FITC approximation replace in the Gaussian process training conditional?

A

𝑝 𝒇 𝒖 = 𝑁𝑲𝒇,𝒖𝑲’𝒖,𝟏𝒖𝒖, 𝑲𝒇,𝒇 − 𝑸𝒇,𝒇

45
Q

What is the computational cost for inducing point methods SoR and FITC dominated by?

A

The cost of inverting 𝐾𝒖,𝒖

46
Q

What is the computational cost’s relationship to the number of inducing points and data points?

A

Cubic in the number of inducing points, linear in the number of data points

47
Q

What are some methods for picking inducing points?

A
  • Chosen randomly
  • Chosen greedily according to some criterion (e.g., variance)
  • Equally spaced in the domain
  • Random points
  • Deterministic grid
48
Q

How can inducing points be optimized?

A

By treating 𝒖 as hyperparameters and maximizing marginal likelihood of the data

49
Q

What must be ensured about the inducing points 𝒖?

A

They must be representative of the data and where predictions are made

50
Q

What is the relationship between Gaussian processes and Bayesian Linear Regression?

A

Gaussian processes = kernelized Bayesian Linear Regression

51
Q

What can be computed in closed form with Gaussian processes?

A

Marginals / conditionals

52
Q

How are hyperparameters optimized in Gaussian processes?

A

By maximizing the marginal likelihood

53
Q

What exists for fast approximations to exact Gaussian process inference?

A

Various fast approximations

54
Q

Which chapters of ‘Gaussian Processes for ML’ by Rasmussen & Williams should be read?

A
  • Chapter 2: 2.1.1-2.3
  • Chapter 4: up to 4.2
55
Q

Which paper provides a unifying view of sparse approximate Gaussian process regression?

A

Quiñonero-Candela & Rasmussen: ‘A Unifying View of Sparse Approximate Gaussian Process Regression’, JMLR 2005

56
Q

Which paper discusses random features for large-scale kernel machines?

A

Rahimi & Recht: ‘Random Features for Large-Scale Kernel Machines’, NeurIPS 2007