L5_KRR Flashcards
Kernel-Methoden können implizit beliebig komplexe versteckte Ebenen modellieren
Kernel methods can implicitly model arbitrarily complex hidden layers
Multilayer Perceptrons (MLP) can
learn linearly non separable problems
Universal Approximation Theorem
With enough intermediate variables, a network with a single hidden layer can approximate any reasonable function of the input.
Kernelizing linear methods
- Map the data into a (high dimensional) feature space
2. Look for linear relations in the feature space
Kernel Trick
Any algorithm for vectorial data that can be expressed only in terms of scalar products between vectors can be performed implicitly in the feature space associated with any kernel, by replacing each scalar product by a kernel evaluation.
A big problem with high dimensional features spaces
When the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse. The amount of data needed for a reliable result often grows exponentially with the Dimensionality
What about the curse of dimensionality?
Representer Theorem
In a regularized learning problem, the optimal weights w in feature space are a linear combination of the training examples in feature space φ(xi ): sum αi φ(xi )
Kernel methods are memory-based methods: 3
Kernels as Similarity Measures
- store the entire training set
- define similarity of data points by kernel function
- new predictions require comparision with previously learned examples
When do humans percieve stimuli as similar?
Perceptual Similarity of new data x decays exponentially with distance from prototype μ
Kernel Methods - Pros
+ Powerful Modeling Tool
(non-linear problems become linear in kernel space)
+ Omnipurpose Kernels
(Gaussian works well in many cases)
+ Kernel methods can handle symbolic objects
+ When you have less data points than your data has dimensions kernel methods can offer a dramatic speedup
Kernel Methods - Cons
– Difficult to understand what’s happening in kernel space
– Model complexity increases with number of data points
→ If you have too much data, kernel methods can be slow