Grocery on Distributed Algorithms

T1: KL Divergence and Mutual Information

Reminder: This post contains 937 words · 3 min read · by Xianbin
\[\textbf{I}(A: B \mid C) = \sum_c p(c) \sum_{a,b}p(a,b\mid c) \log \frac{p(a,b\mid c)}{p(a\mid c)p(b\mid c)}\] \[\textbf{I}(A: B \mid C) = \sum_c p(c) \sum_{a,b}p(a,b\mid c) \log \frac{p(a,b\mid c)}{p(a\mid c)p(b\mid c)} \\ = \sum_c p(c) \sum_{a,b}\frac{p(a,b,c)}{p(c)}\log \frac{p(a,b,c)}{p(a\mid c)p(b\mid c)p(c)}\\ =\sum_{a,b,c}p(a,b,c)\log \frac{p(a,b,c)}{p(a\mid c)p(b \mid c)p(c)} = D_{KL}(p(a,b,c) \mid \mid p(a\mid c)p(b\mid c) p(c))\]

Notice that

\[\frac{p(a,b,c)}{p(a\mid c)p(b \mid c)p(c)} = \frac{p(a\mid b c)}{p(a\mid c)} =\frac{ p(b \mid ac) }{p(b\mid c)}\]

So,

\[\textbf{I}(A: B \mid C) =\sum_{a,b,c}p(a,b,c)\log \frac{p(a,b,c)}{p(a\mid c)p(b \mid c)p(c)} \\ = \sum_{a,b,c}p(a,b,c)\log \frac{p(a\mid b c)}{p(a\mid c)} = \sum_{a,b,c}p(a,b,c)\log \frac{ p(b \mid ac) }{p(b\mid c)}\\= \sum_{a,b,c}p(bc)p(a\mid bc)\log \frac{p(a\mid b c)}{p(a\mid c)} =\mathbb{E}_{p(bc)}D_{KL}(p(a \mid bc) \mid \mid p(a \mid c))\\= \sum_{a,b,c}p(ac)p(b\mid ac)\log \frac{p(b\mid ac)}{p(b\mid c)} = \mathbb{E}_{p(ac)} D_{KL}(p(b\mid ac ) \mid \mid p(b \mid c)\]