Chapter 21 Information Theory Flashcards

Question 1

Q

Information theory is a subfield of mathematics concerned with ____.

P 206

Answer

A

transmitting data across a noisy channel

Question 2

Q

Calculating information and entropy is a useful tool in machine learning and is used as the basis for techniques such as ____ , ____ , and, more generally, ____. P 206

P 206

Answer

A

feature selection, building decision trees, fitting classification models

Question 3

Q

Information theory is concerned with representing data in a compact fashion (a task known as ____ or ____ ), as well as with transmitting and storing it in a way that is robust to errors (a task known as ____ or ____ ). P 207

Answer

A

data compression, source coding, error correction, channel coding

Question 4

Q

What is the intuition behind quantifying information? 207

Answer

A

The intuition behind quantifying information is the idea of measuring how much surprise there is in an event, that is, how unlikely it is. Those events that are rare (low probability) are more surprising and therefore have more information those events that are common (high probability).

 Low Probability Event: High Information (surprising).

 High Probability Event: Low Information (unsurprising).

The basic intuition behind information theory is that learning that an unlikely event has occurred is more informative than learning that a likely event has occurred.

Rare events are more uncertain or more surprising and require more information to represent them than common events.

Question 5

Q

We can calculate the amount of information there is in an event using the probability of the event. This is called ____, ____, or simply the ____, and can be calculated for a discrete event I(x) as follows: ____

P 208

Answer

A

Shannon information, self-information, information,
I(x) = − log(p(x))
The negative sign ensures that the result is always positive or zero.

base-2 logarithm

Information will be zero when the probability of an event is 1.0 or a certainty, e.g. there is no surprise

Question 6

Q

I(x)=− log₂(p(x))
The choice of the base-2 logarithm means that ____. This can be directly interpreted in the information processing sense as ____. The calculation of information is often written as h() to contrast it with entropy H():
h(x) = − log₂(p(x))

Answer

A

the units of the information measure is in bits (binary digits).
The number of bits required to represent an event on a noisy communication channel

Question 7

Q

Other logarithms can be used instead of the base-2. For example, it is also common to use the ____ logarithm that uses base ____ in calculating the information, in which case the units are referred to as ____. P 209

Answer

A

Natural, e (Euler’s number), nats

Question 8

Q

In effect, calculating the information for a random variable is the same as calculating the information for the probability distribution of the events for the random variable. Calculating the information for a random variable is called ____, ____, or simply ____.

P 210

Answer

A

information entropy, Shannon entropy, entropy.

Question 9

Q

What is the intuition behind choosing “entropy” for information gained from a random variable?

P 210

Answer

A

The intuition for entropy is that it is the average number of bits required to represent or transmit an event drawn from the probability distribution for the random variable.

The -Shannon- entropy of a distribution is the expected amount of information in an event drawn from that distribution.

Question 10

Q

Entropy can be calculated for a random variable X with K discrete states as follows:

P 211

Answer

A

.
H(X) = − ∑ ^K_i=1 p(k_i) × log(p(k_i))

Like information, the log() function uses base-2 and the units are bits. The natural logarithm can be used instead and the units will be nats.

Question 11

Q

The lowest entropy is calculated for a random variable that ____. The largest entropy for a random variable will be if ____.

P 211

Answer

A

has a single event with a probability of 1.0, a certainty

all events are equally likely

Question 12

Q

In the case where one event dominates, such as a skewed probability distribution, then there is less surprise and the distribution will have a ____ (lower/higher) entropy. In the case where no event dominates another, such as equal or approximately equal probability distribution, then we would expect ____ (smaller/larger) entropy.

P 212

Answer

A

Lower
larger or maximum

Skewed Probability Distribution
(unsurprising): Low entropy.
Balanced Probability Distribution (surprising): High entropy.

Question 13

Q

Calculating the entropy for a random variable provides the basis for other measures such as mutual information (information gain). True/False

P 213

Answer

A

True

It also provides the basis for calculating the difference between two probability distributions with cross-entropy and the KL-divergence.

Question 14

Q

Information provides a way to quantify the amount of ____ for an event measured in bits.
Entropy provides a measure of the ____ needed to represent an event drawn from a probability distribution for a random variable.

P 214

Answer

A

Surprise, Average amount of information

The more unlikely the event, the more surprising it is, and the more information it has.
The more something is probable, the more information we already have about it, so it’s the unlikelier events that offer more information. for example, a person in their 20s is more likely to be healty not really informative, but if this person had a heart attack, which is something unlikely, it would give us information about the state of this body