Toolbox

T1: KL Divergence and Mutual Information

Reminder: This post contains 937 words · 3 min read · by Xianbin
I(A:BC)=cp(c)a,bp(a,bc)logp(a,bc)p(ac)p(bc)\textbf{I}(A: B \mid C) = \sum_c p(c) \sum_{a,b}p(a,b\mid c) \log \frac{p(a,b\mid c)}{p(a\mid c)p(b\mid c)} I(A:BC)=cp(c)a,bp(a,bc)logp(a,bc)p(ac)p(bc)=cp(c)a,bp(a,b,c)p(c)logp(a,b,c)p(ac)p(bc)p(c)=a,b,cp(a,b,c)logp(a,b,c)p(ac)p(bc)p(c)=DKL(p(a,b,c)p(ac)p(bc)p(c))\textbf{I}(A: B \mid C) = \sum_c p(c) \sum_{a,b}p(a,b\mid c) \log \frac{p(a,b\mid c)}{p(a\mid c)p(b\mid c)} \\ = \sum_c p(c) \sum_{a,b}\frac{p(a,b,c)}{p(c)}\log \frac{p(a,b,c)}{p(a\mid c)p(b\mid c)p(c)}\\ =\sum_{a,b,c}p(a,b,c)\log \frac{p(a,b,c)}{p(a\mid c)p(b \mid c)p(c)} = D_{KL}(p(a,b,c) \mid \mid p(a\mid c)p(b\mid c) p(c))

Notice that

p(a,b,c)p(ac)p(bc)p(c)=p(abc)p(ac)=p(bac)p(bc)\frac{p(a,b,c)}{p(a\mid c)p(b \mid c)p(c)} = \frac{p(a\mid b c)}{p(a\mid c)} =\frac{ p(b \mid ac) }{p(b\mid c)}

So,

I(A:BC)=a,b,cp(a,b,c)logp(a,b,c)p(ac)p(bc)p(c)=a,b,cp(a,b,c)logp(abc)p(ac)=a,b,cp(a,b,c)logp(bac)p(bc)=a,b,cp(bc)p(abc)logp(abc)p(ac)=Ep(bc)DKL(p(abc)p(ac))=a,b,cp(ac)p(bac)logp(bac)p(bc)=Ep(ac)DKL(p(bac)p(bc)\textbf{I}(A: B \mid C) =\sum_{a,b,c}p(a,b,c)\log \frac{p(a,b,c)}{p(a\mid c)p(b \mid c)p(c)} \\ = \sum_{a,b,c}p(a,b,c)\log \frac{p(a\mid b c)}{p(a\mid c)} = \sum_{a,b,c}p(a,b,c)\log \frac{ p(b \mid ac) }{p(b\mid c)}\\= \sum_{a,b,c}p(bc)p(a\mid bc)\log \frac{p(a\mid b c)}{p(a\mid c)} =\mathbb{E}_{p(bc)}D_{KL}(p(a \mid bc) \mid \mid p(a \mid c))\\= \sum_{a,b,c}p(ac)p(b\mid ac)\log \frac{p(b\mid ac)}{p(b\mid c)} = \mathbb{E}_{p(ac)} D_{KL}(p(b\mid ac ) \mid \mid p(b \mid c)