Gaussian Processes Flashcards
How does Gaussian processes handle uncertainty?
We want to predict an entire distribution of possible outcomes. We can think of it as we want to predict the price of something we have not yet observed. We want to predict every single point, so we need a function for these predictions. Based on our input we consider all the possible functions - meaning all of which have a probability associated with it. Assuming we have a probability distribution of possible functions.
When we add more data, we update the probability distribution. The GAussian processes are determined by a mean function and covariance function - so we can quantity the uncertainty of predictions (in terms of probability)
Explain the concept of kernel ridge regression. In what scenarios is kernel ridge regression particularly useful, and how does it handle non-linear relationships in data?
Explain the concept of kernel ridge regression. In what scenarios is kernel ridge regression particularly useful, and how does it handle non-linear relationships in data?
Kernel Ridge Regression is an extension of Ridge regression where we penalize (Regularize) all features. To it, we add the kernel trick - where we turn the data into a higher dimension, so that we can handle non-linear relationships.
The kernel trick is that we map the data into a higher space, for example if we have a bunch of X -values, we can form them into y=x^2 such that each datapoint is given a new coordinate in the higher dimension (now 2 compared to 1) (x, x^2)
To be able to do this, we use a kernel function - common functions are Polynomial, sigmoid, RBF
KRR is useful in situations where we have non-linear relationships between input and output data.
They are good for COMPLEX datasets - where linear models underfit.
Ther reguralization mitigates the effect of outliers.We want to predict an entire distribution of possible outcomes.
Discuss the Bayesian learning approach in the context of machine learning. Explain how Bayesian methods address overfitting, and differentiate between the prior and posterior distributions in Bayesian learning.
The approach:
It aims to find the distributions of parameters, rather than single point estimates. In comparison to traditional OLS - it treats parameters ars random variables so that it captures the uncertainty based on the data we have (Training datA)
Predictions:
Predictions are represented by distributions, rather than single points. This gives a more informative way to understand the uncertainty.
Overfitting:
We use the prior distribution, as a guiding to get our parameters and prevent it from fitting on the noise.
Prior distribution:
The prior distribution is our initial beliefs or knowledge of the data. These are our base assumptions. So naturally, the choice of prior matters on the model, it works as a regularization term - preventing extreme parameter values.
Posterior:
When we have observed the data, the prior is used and updated to get the posterior. this method uses bayes theorem to find the posterior distribution - including the likelihood of the data given some parameters.
Parameter uncertainty:
We do not only get single values for parameters, but distribution so that we can see the uncertainty.