Kullback-Leibler (KL) Divergence Flashcards
Unnamed: 0
Unnamed: 1
Kullback-Leibler (KL) Divergence, also known as relative entropy
Kullback-Leibler (KL) Divergence, also known as relative entropy, is a measure of how one probability distribution diverges from a second, expected probability distribution. It is commonly used in machine learning, particularly in tasks involving probabilistic models.
- Definition
KL Divergence is a measure of the difference between two probability distributions. It quantifies the “distance” between the two distributions in terms of their statistical divergence.
- Non-Symmetric
It is important to note that KL Divergence is not symmetric. That is, the KL divergence of distribution Q from distribution P is not the same as the KL divergence of distribution P from distribution Q.
- Non-Negativity
KL Divergence is always non-negative and equals zero if and only if the two distributions are identical.
- Mathematical Formulation
If P and Q are discrete probability distributions, the KL Divergence of Q from P is calculated as the sum over all events x of P(x) times the logarithm base 2 of the ratio of P(x) over Q(x).
- Use Cases in Machine Learning
KL Divergence is used in many areas of machine learning, including in the training of probabilistic models (like variational autoencoders), model selection (by comparing the fit of different models to data), feature selection (by comparing the distribution of features to a desired outcome), and many other tasks.
- In Information Theory
KL Divergence is also a key concept in information theory. In this context, it measures the number of extra bits needed to code samples from probability distribution P when using a code based on probability distribution Q.
- Limitations
While KL Divergence is a powerful tool, it does have limitations. For instance, it is not defined if there are points where P(x) is nonzero and Q(x) is zero, which can make practical computation tricky. It also assumes that P and Q are defined on the same probability space.