Toolbox

T1: Entropy and Counting

Reminder: This post contains 1921 words · 6 min read · by Xianbin

Entropy and Counting

Theorem 1. Fix any parameter zhu1/2zhu \leq 1/2, for any nn, we have i=1zhun(ni)2H2(zhu)n\sum_{i=1}^{zhu\cdot n} {n \choose i} \leq 2^{H_2(zhu) n}

Proof

To prove this theorem, we need to deeply understand the basics of entropy.

Lemma 1. Consider a random variable XX with finite support X\mathcal{X}, the Shannon entropy of XX is maximized when XμX\sim \mu, where μ\mu is a discrete uniform distribution.

Proof. For xXx\in \mathcal{X}, let us consider two distributions f(x)f(x), g(x)g(x) (distribution is the set of all probability mass functions over the support). Let g(x)g(x) be the uniform distribution.

We need to compare the difference between two distributions. There is a useful tool, KL-divergence or relative entropy.

KL(fg)=xXf(x)logf(x)g(x)=H(X)+logX0KL(f\Vert g) = \sum_{x\in \mathcal{X}}f(x) \log \frac{f(x)}{g(x)} = -H(X) + \log \lvert \mathcal{X} \vert \geq 0

Therefore,

H(X)logXH(X) \leq \log \lvert \mathcal{X} \rvert

Also, we notice that H(g(x))=logXH(g(x)) = \log \lvert \mathcal{X} \rvert

Therefore, it is proved.

Now, we can prove Theorem 1.

By the above lemma, we may think of using the uniform distribution to prove the inequality.

First, let us see what the LHS means.

Imagine we have a set CC, and we uniformly at random select izhuni\leq zhu \cdot n elements.

Apparently, it is LHS. Now, let us think of the problem in another way. What if we select a member uniformly from CC?

Let XX be such a random variable with support CC

H(X)=logC=logLHSH(X) = \log \lvert C\lvert = \log \text{LHS}

It means that

LHS=2H(X)LHS = 2^{H(X)}

Therefore, it enough to prove that

H(X)H2(zhu)nH(X) \leq H_2(zhu) n

Notice that the cardinality of XX can be large, e.g., zhunzhu \cdot n. So, it is not obvious that the above is true.

We can use X=(X1,,Xn)X = (X_1, \ldots, X_n) to denote the random variable. If iXi\in X, Xi=1X_i = 1; otherwise, Xi=0X_i = 0.

Notice that adding all ‘0’ or ‘1’, does not change the entropy (the probability is one.)

By the subadditivity of the entropy, we have

H(X)i=1nH(Xi)H(X) \leq \sum_{i=1}^n H(X_i)

Then, it is enough to prove that

H(Xi)H2(zhu)H(X_i) \leq H_2(zhu)

It is enough to prove that p(iX)zhup(i\in X) \leq zhu, which is true.

Appendix

H2(x)=(xlogx+(1x)log(1x))H_2(x) = -(x\log x + (1-x) \log (1-x)) is an increasing function if 0<x<1/20< x < 1/2.

Reference

[1]. Three tutorial lectures on entropy and counting1. David Galvin