Chapter 22 Divergence Between Probability Distributions Flashcards
What is statistical distance and give an example of its use case?
P 224
Statistical distance is the general idea of calculating the difference between statistical objects like different probability distributions for a random variable.
What’s the divergence between two probability distributions? Give an example of a use case.
P 216
Divergence is a scoring
of how one distribution differs from another. For example, we may have a single random variable and two different probability distributions for the variable, such as a true distribution and an approximation of that distribution
What does this mean: “Divergence is not a symmetric score”?
P 216
It means calculating the divergence for distributions P and Q would give a different score from Q and P.
Divergence scores are an important
foundation for many different calculations in information theory and more generally in machine learning. give 2 examples of its use in ML.
P 216
For example, they provide shortcuts for calculating scores such as mutual information (information gain) and cross-entropy used as a loss function for classification models.
What’s the intuition of KL-divergence?
P 217
The intuition for the KL-divergence score is that when the probability for an event from P is large/small, but the probability for the same event in Q is small/large, there is a large divergence. the KL-divergence score is the sum (or integral) of all events of the variable.
The KL divergence summarizes the number of additional bits (i.e. calculated with the base-2 logarithm) required to represent an event from the random variable. The better our approximation, the less additional information is required.
the KL divergence is the average number of extra bits needed to encode the data, due to the fact that we used distribution q to encode the data instead of the true distribution p.
KL-Divergence can be used to measure the divergence between discrete or continuous probability distributions. True/False
P 217
True
What’s another name for KL-Divergence?
P 217
Relative entropy
When the KL-Divergence score is 0, it suggests that distributions are ____, otherwise the score is ____(Negative/Positive). Importantly, the KL divergence score is not symmetrical, which means: ____
P 217
Identical, Positive
KL(P||Q) != KL(Q||P)
The Jensen-Shannon divergence, or JS-Divergence for short, is another way to quantify the difference (or similarity) between two probability distributions. It uses the KL divergence to calculate a ____ score that is ____.
P 220
normalized, symmetrical (JS(P||Q) ≡ JS(Q||P))
Why is JS-Divergence more useful than KL-Divergence?
P 221
JS-Divergence is more useful as a measure as it provides a smoothed and normalized version of KL-Divergence, with scores between 0 (identical) and 1 (maximally different), when using the base-2 logarithm.
What’s the square root of JS-Divergence called?
P 221
The square root of the score gives a quantity referred to as the Jensen-Shannon distance, or JS distance for short.