Kullback-Leibler (KL) Divergence Flashcards

Question 1

Q

Unnamed: 0

Answer

A

Unnamed: 1

Question 2

Q

Kullback-Leibler (KL) Divergence, also known as relative entropy

Answer

A

Kullback-Leibler (KL) Divergence, also known as relative entropy, is a measure of how one probability distribution diverges from a second, expected probability distribution. It is commonly used in machine learning, particularly in tasks involving probabilistic models.

Question 3

Q

Definition

Answer

A

KL Divergence is a measure of the difference between two probability distributions. It quantifies the “distance” between the two distributions in terms of their statistical divergence.

Question 4

Q

Non-Symmetric

Answer

A

It is important to note that KL Divergence is not symmetric. That is, the KL divergence of distribution Q from distribution P is not the same as the KL divergence of distribution P from distribution Q.

Question 5

Q

Non-Negativity

Answer

A

KL Divergence is always non-negative and equals zero if and only if the two distributions are identical.

Question 6

Q

Mathematical Formulation

Answer

A

If P and Q are discrete probability distributions, the KL Divergence of Q from P is calculated as the sum over all events x of P(x) times the logarithm base 2 of the ratio of P(x) over Q(x).

Question 7

Q

Use Cases in Machine Learning

Answer

A

KL Divergence is used in many areas of machine learning, including in the training of probabilistic models (like variational autoencoders), model selection (by comparing the fit of different models to data), feature selection (by comparing the distribution of features to a desired outcome), and many other tasks.

Question 8

Q

In Information Theory

Answer

A

KL Divergence is also a key concept in information theory. In this context, it measures the number of extra bits needed to code samples from probability distribution P when using a code based on probability distribution Q.

Question 9

Q

Limitations

Answer

A

While KL Divergence is a powerful tool, it does have limitations. For instance, it is not defined if there are points where P(x) is nonzero and Q(x) is zero, which can make practical computation tricky. It also assumes that P and Q are defined on the same probability space.