Toolbox

T1: Basics about Mutual Information

Reminder: This post contains 2535 words · 8 min read · by Xianbin

Basic Definition:

\[I(A: B) = H(A) - H(A\mid B) = H(B) - H(B \mid A)\] \[I(A:B\mid C) = H(A \mid C) - H(A\mid B C)\]

\(\textbf{Lemma 1}\). \(I(AB: C) = I(A:C) + I(B:C\mid A)\)

\(\textbf{Proof}\).

\[I(AB: C) = H(AB) - H(AB \mid C) = H(C) - H(C \mid AB) = \\ H(C)- H(C\mid A) + H(C\mid A) - H(C \mid AB) = \\ I(A:C) + I(C:B\mid A)\]

Since \(A,B\) are symmetric, we can also get

\[I(AB: C) = I(B:C) + I(A:C\mid B)\]

It is not easy to remember the above inequalities and definitions. Here is a easier way.

Use the following tips to memorize the above equalities!

\(I(A: B) = H(A) - H(A\mid B) = H(B) - H(B\mid A)\) means that the size of the information of A disclosed by B (or the information of B disclosed by A), i.e., the common part. Then, it is the information of \(A\) minus the information of common of what \(A\) uniquely has.

\[I(A:B \mid C)\]

Forget about the condition first.

\[I(A: B)= H(A) - H(A\mid B)\]

Thus, we plus the condition

\[I(A:B \mid C) = H(A\mid C) - H(A \mid B C)\]

As for \(I(AB:C)\), it means that the size of information disclosed by \(C\) from \(AB\). Then, it is equal to what \(C\) discloses \(A\) and then we know \(A\), based on \(A\), we have \(I(B: C\mid A)\) to know from \(C\).

Some Useful Inequalities

\(\textbf{Lemma}\)(condition on entropy). \(\textbf{H}(X \mid Y, Z ) \leq \textbf{H}(X \mid Y)\). If \(X \perp Z \mid Y\), the equality holds.

\(\textbf{Proof}\).

\(\textbf{H}(X\mid Y, Z) = \textbf{H}(X \mid Y) - \textbf{I}(X: Z \mid Y) \leq \textbf{H}(X \mid Y)\) The equality holds if and only if \(\textbf{I}(X: Z \mid Y) = 0\). Then, \(X \perp Z \mid Y\).

To memorize this inequality, use the tip “condition will reduce the entropy”.

\(\textbf{Lemma}\)(condition increases mutual information) If \(W \perp Z \mid Y\), then we have \(\textbf{I}(W:X\mid Y) \leq \textbf{I}(W:X\mid Y, Z)\).

\(\textbf{Proof}\).

\[\textbf{I}(W:X \mid Y) \leq \textbf{H}(W\mid Y) - \textbf{H}(W\mid X, Y) =\\ \textbf{H}(W \mid Y, Z) - \textbf{H}(W\mid X, Y, Z) =\\ \textbf{I}(W:X\mid Y, Z)\]

\(\textbf{Lemma}\)(condition decreases mutual information) If \(W \perp Z \mid X, Y\), then we have \(\textbf{I}(W:X\mid Y) \geq \textbf{I}(W:X\mid Y, Z)\).

\(\textbf{Claim 1}\). \(\textbf{I}(X:Z) = 0\), then, we have \(\textbf{I}(X:Y) \leq \textbf{I}(X: Y \mid Z)\).

\(\textbf{Claim 2}\). \(\textbf{I}(X:Z\mid Y) = 0\), then, we have \(\textbf{I}(X:Y) \geq \textbf{I}(X: Y \mid Z)\).

\(\textbf{Observation 3}\). Given \(A = C-B\), we have

\[\mathbf{I}(R: A \mid C) = \mathbf{I}(R: B \mid C)\]

\(\textbf{Lemma 4: }\). Given \(X\to Y\to Z\), we have

\[\mathbf{I}(Y:Z) \geq \mathbf{I}(Y:Z \mid X)\]

and

\[\mathbf{I}(X:Y) \geq \mathbf{I}(X:Y \mid Z)\]

Proof.

\[\mathbf{I}(X:Y, Z) = \mathbf{I}(X:Y) + \mathbf{I}(X:Z\mid Y) = \mathbf{I}(X:Z) + \mathbf{I}(X:Y\mid Z)\]

Use the condition \(\mathbf{I}(X: Z\mid Y) = 0\), we can prove this lemma easily.

strange but useful:

\[\mathbf{I}(C A: B\mid C) = \mathbf{I}(A:B\mid C)\]

Intuitively, it means that given \(C\), we can use that to indicate information.

The end