Reminder: This post contains 1921 words
· 6 min read
· by Xianbin
Entropy and Counting
Theorem 1. Fix any parameter \(zhu \leq 1/2\), for any \(n\), we have
\(\sum_{i=1}^{zhu\cdot n} {n \choose i} \leq 2^{H_2(zhu) n}\)
Proof
To prove this theorem, we need to deeply understand the basics of entropy.
Lemma 1. Consider a random variable \(X\) with finite support \(\mathcal{X}\), the Shannon entropy of \(X\) is maximized when \(X\sim \mu\), where \(\mu\) is a discrete uniform distribution.
Proof. For \(x\in \mathcal{X}\), let us consider two distributions \(f(x)\), \(g(x)\) (distribution is the set of all probability mass functions over the support). Let \(g(x)\) be the uniform distribution.
We need to compare the difference between two distributions. There is a useful tool, KL-divergence or relative entropy.
\[KL(f\Vert g) = \sum_{x\in \mathcal{X}}f(x) \log \frac{f(x)}{g(x)} = -H(X) + \log \lvert \mathcal{X} \vert \geq 0\]
Therefore,
\[H(X) \leq \log \lvert \mathcal{X} \rvert\]
Also, we notice that
\(H(g(x)) = \log \lvert \mathcal{X} \rvert\)
Therefore, it is proved.
Now, we can prove Theorem 1.
By the above lemma, we may think of using the uniform distribution to prove the inequality.
First, let us see what the LHS means.
Imagine we have a set \(C\), and we uniformly at random select \(i\leq zhu \cdot n\) elements.
Apparently, it is LHS. Now, let us think of the problem in another way. What if we select a member uniformly from \(C\)?
Let \(X\) be such a random variable with support \(C\)
\[H(X) = \log \lvert C\lvert = \log \text{LHS}\]
It means that
\[LHS = 2^{H(X)}\]
Therefore, it enough to prove that
\[H(X) \leq H_2(zhu) n\]
Notice that the cardinality of \(X\) can be large, e.g., \(zhu \cdot n\). So, it is not obvious that the above is true.
We can use \(X = (X_1, \ldots, X_n)\) to denote the random variable. If \(i\in X\), \(X_i = 1\); otherwise, \(X_i = 0\).
Notice that adding all ‘0’ or ‘1’, does not change the entropy (the probability is one.)
By the subadditivity of the entropy, we have
\[H(X) \leq \sum_{i=1}^n H(X_i)\]
Then, it is enough to prove that
\[H(X_i) \leq H_2(zhu)\]
It is enough to prove that \(p(i\in X) \leq zhu\), which is true.
Appendix
\(H_2(x) = -(x\log x + (1-x) \log (1-x))\) is an increasing function if \(0< x < 1/2\).
Reference
[1]. Three tutorial lectures on entropy and counting1. David Galvin