Chapter 22 Divergence Between Probability Distributions Flashcards

Question 1

Q

What is statistical distance and give an example of its use case?

P 224

Answer

A

Statistical distance is the general idea of calculating the difference between statistical objects like different probability distributions for a random variable.

Question 2

Q

What’s the divergence between two probability distributions? Give an example of a use case.

P 216

Answer

A

Divergence is a scoring
of how one distribution differs from another. For example, we may have a single random variable and two different probability distributions for the variable, such as a true distribution and an approximation of that distribution

Question 3

Q

What does this mean: “Divergence is not a symmetric score”?

P 216

Answer

A

It means calculating the divergence for distributions P and Q would give a different score from Q and P.

Question 4

Q

Divergence scores are an important
foundation for many different calculations in information theory and more generally in machine learning. give 2 examples of its use in ML.

P 216

Answer

A

For example, they provide shortcuts for calculating scores such as mutual information (information gain) and cross-entropy used as a loss function for classification models.

Question 5

Q

What’s the intuition of KL-divergence?

P 217

Answer

A

The intuition for the KL-divergence score is that when the probability for an event from P is large/small, but the probability for the same event in Q is small/large, there is a large divergence. the KL-divergence score is the sum (or integral) of all events of the variable.

The KL divergence summarizes the number of additional bits (i.e. calculated with the base-2 logarithm) required to represent an event from the random variable. The better our approximation, the less additional information is required.

the KL divergence is the average number of extra bits needed to encode the data, due to the fact that we used distribution q to encode the data instead of the true distribution p.

Question 6

Q

KL-Divergence can be used to measure the divergence between discrete or continuous probability distributions. True/False

P 217

Question 7

Q

What’s another name for KL-Divergence?

P 217

Answer

A

Relative entropy

Question 8

Q

When the KL-Divergence score is 0, it suggests that distributions are ____, otherwise the score is ____(Negative/Positive). Importantly, the KL divergence score is not symmetrical, which means: ____

P 217

Answer

A

Identical, Positive
KL(P||Q) != KL(Q||P)

Question 9

Q

The Jensen-Shannon divergence, or JS-Divergence for short, is another way to quantify the difference (or similarity) between two probability distributions. It uses the KL divergence to calculate a ____ score that is ____.

P 220

Answer

A

normalized, symmetrical (JS(P||Q) ≡ JS(Q||P))

Question 10

Q

Why is JS-Divergence more useful than KL-Divergence?

P 221

Answer

A

JS-Divergence is more useful as a measure as it provides a smoothed and normalized version of KL-Divergence, with scores between 0 (identical) and 1 (maximally different), when using the base-2 logarithm.

Question 11

Q

What’s the square root of JS-Divergence called?

P 221

Answer

A

The square root of the score gives a quantity referred to as the Jensen-Shannon distance, or JS distance for short.

Chapter 22 Divergence Between Probability Distributions Flashcards

(11 cards)