Chapter 21 Information Theory Flashcards

1
Q

Information theory is a subfield of mathematics concerned with ____.

P 206

A

transmitting data across a noisy channel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Calculating information and entropy is a useful tool in machine learning and is used as the basis for techniques such as ____ , ____ , and, more generally, ____. P 206

P 206

A

feature selection, building decision trees, fitting classification models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Information theory is concerned with representing data in a compact fashion (a task known as ____ or ____ ), as well as with transmitting and storing it in a way that is robust to errors (a task known as ____ or ____ ). P 207

A

data compression, source coding, error correction, channel coding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the intuition behind quantifying information? 207

A

The intuition behind quantifying information is the idea of measuring how much surprise there is in an event, that is, how unlikely it is. Those events that are rare (low probability) are more surprising and therefore have more information those events that are common (high probability).

ˆ Low Probability Event: High Information (surprising).

ˆ High Probability Event: Low Information (unsurprising).

The basic intuition behind information theory is that learning that an unlikely event has occurred is more informative than learning that a likely event has occurred.

Rare events are more uncertain or more surprising and require more information to represent them than common events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

We can calculate the amount of information there is in an event using the probability of the event. This is called ____, ____, or simply the ____, and can be calculated for a discrete event I(x) as follows: ____

P 208

A

Shannon information, self-information, information,
I(x) = − log(p(x))
The negative sign ensures that the result is always positive or zero.

base-2 logarithm

Information will be zero when the probability of an event is 1.0 or a certainty, e.g. there is no surprise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

I(x)=− log2(p(x))
The choice of the base-2 logarithm means that ____. This can be directly interpreted in the information processing sense as ____. The calculation of information is often written as h() to contrast it with entropy H():
h(x) = − log2(p(x))

A

the units of the information measure is in bits (binary digits).
The number of bits required to represent an event on a noisy communication channel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Other logarithms can be used instead of the base-2. For example, it is also common to use the ____ logarithm that uses base ____ in calculating the information, in which case the units are referred to as ____. P 209

A

Natural, e (Euler’s number), nats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In effect, calculating the information for a random variable is the same as calculating the information for the probability distribution of the events for the random variable. Calculating the information for a random variable is called ____, ____, or simply ____.

P 210

A

information entropy, Shannon entropy, entropy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the intuition behind choosing “entropy” for information gained from a random variable?

P 210

A

The intuition for entropy is that it is the average number of bits required to represent or transmit an event drawn from the probability distribution for the random variable.

The -Shannon- entropy of a distribution is the expected amount of information in an event drawn from that distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Entropy can be calculated for a random variable X with K discrete states as follows:

P 211

A

.
H(X) = − ∑ Ki=1 p(ki) × log(p(ki))

Like information, the log() function uses base-2 and the units are bits. The natural logarithm can be used instead and the units will be nats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The lowest entropy is calculated for a random variable that ____. The largest entropy for a random variable will be if ____.

P 211

A

has a single event with a probability of 1.0, a certainty

all events are equally likely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In the case where one event dominates, such as a skewed probability distribution, then there is less surprise and the distribution will have a ____ (lower/higher) entropy. In the case where no event dominates another, such as equal or approximately equal probability distribution, then we would expect ____ (smaller/larger) entropy.

P 212

A

Lower
larger or maximum

ˆ Skewed Probability Distribution
(unsurprising): Low entropy.
ˆ Balanced Probability Distribution (surprising): High entropy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Calculating the entropy for a random variable provides the basis for other measures such as mutual information (information gain). True/False

P 213

A

True

It also provides the basis for calculating the difference between two probability distributions with cross-entropy and the KL-divergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ˆ Information provides a way to quantify the amount of ____ for an event measured in bits.
ˆ Entropy provides a measure of the ____ needed to represent an event drawn from a probability distribution for a random variable.

P 214

A

Surprise, Average amount of information

The more unlikely the event, the more surprising it is, and the more information it has.
The more something is probable, the more information we already have about it, so it’s the unlikelier events that offer more information. for example, a person in their 20s is more likely to be healty not really informative, but if this person had a heart attack, which is something unlikely, it would give us information about the state of this body

How well did you know this?
1
Not at all
2
3
4
5
Perfectly