Chapter 24 Information Gain and Mutual Information Flashcards

Question 1

Q

Information gain is calculated by ____.
Mutual information calculates the ____ and is the name given to information gain when applied to variable selection.

P 242

Answer

A

comparing the entropy of the dataset before and after a transformation.

statistical dependence between two variables.

Question 2

Q

How can entropy be used as a calculation of the purity of a dataset?

P 244

Answer

A

According to Entropy formula for binary classes:
Entropy = −(p(0) × log(P(0)) + p(1) × log(P(1)))
This shows how balanced the distribution of classes happens to be. An entropy of 0 bits indicates a dataset containing one class; an entropy of 1 or more bits suggests maximum entropy for a balanced dataset (depending on the number of classes), with values in between indicating levels between these extremes.

Question 3

Q

A smaller entropy suggests ____(less/more) purity or ____(more/less) surprise.

P 244

Answer

A

More, less

Question 4

Q

Information gain provides a way to use entropy to calculate how a change to the dataset impacts the purity of the dataset, e.g. the distribution of classes. True/False

P 244

Question 5

Q

What’s information gain?

P 244

Answer

A

information gain, is simply the expected reduction in entropy caused by partitioning the examples according to an attribute

Question 6

Q

What’s the Information Gain formula for dataset S and variable a?

P 244

Answer

A

IG(S, a) = H(S) − H(S|a)
Where IG(S, a) is the information gain for the dataset S for the variable a for a random variable, H(S) is the entropy for the dataset before any change and H(S|a) is the conditional entropy for the dataset given the variable a.

Question 7

Q

Maximizing the entropy is equivalent to maximizing the information gain. True/False

P 247

Answer

A

False, minimizing the entropy is equivalent to maximizing the information gain.

Question 8

Q

How is information gain used in decision trees?

P 247

Answer

A

The information gain is calculated for each variable in the dataset. The variable that has the largest information gain is selected to split the dataset. Generally, a larger gain indicates a smaller entropy or less surprise. (According to IG(S, a) = H(S) − H(S|a) )
The process is then repeated on each created group, excluding the variable that was already chosen. This stops once a desired depth to the decision tree is reached or no more splits are
possible.

What’s important for splitting, is the purity of each group resulted from the split, if the information gain is large, it means the expected entropy for the variable is small, which means more purity on average in each result split, which is what we want from a good classifier; to separate the classes well, so that each split is more pure. Worked Example P 246

Question 9

Q

What’s mutual information?

P 248

Answer

A

Mutual information is calculated between two variables and measures the reduction in uncertainty for one variable given a known value of the other variable.

A quantity called mutual information measures the amount of information one can obtain from one random variable given another.

It measures the average reduction in uncertainty about x that results from learning the value of y; or vice versa, the average amount of information that x conveys about y.

Question 10

Q

How is mutual information calculated?

P 248

Answer

A

I(X; Y ) = H(X) − H(X|Y)
Where I(X; Y ) is the mutual information for X and Y , H(X) is the entropy for X and H(X|Y ) is the conditional entropy for X given Y . The result has the units of bits.

Entropy of a variable is a measure of expected surprise, or uncertainty in that variable, subtracting the conditional entropy from it, quantifies how much of the uncertainty (surprise, entropy) is explained by the other variable, therefore, the definition of mutual information: Mutual information measures the reduction in uncertainty for one variable given a known value of the other variable.

Question 11

Q

Why is mutual information symmetrical?

P 248

Answer

A

Mutual information is a measure of dependence or mutual dependence between two random variables. As such, the measure is symmetrical, meaning that I(X; Y ) = I(Y ; X).

Question 12

Q

The mutual information can be calculated using KL-Divergence. True/False

P 248

Answer

A

True.
The mutual information can also be calculated as the KL divergence between the joint probability distribution and the product of the marginal probabilities for each variable.
This can be stated formally as follows:
I(X; Y ) = KL(p(X, Y )||p(X) × p(Y ))

Question 13

Q

Mutual information is always larger than or equal to ____, where the larger the value, the greater the relationship between the two variables. If the calculated result is zero, then the variables are ____.

P 248

Answer

A

Zero, Independent

Question 14

Q

Mutual Information and Information Gain are the same thing. True/False

P 248

Answer

A

True.
Mutual Information and Information Gain are the same thing, although the context or usage of the measure often gives rise to the different names.

Mutual information is sometimes used as a synonym for information gain.

Chapter 24 Information Gain and Mutual Information Flashcards

(14 cards)