T1: Entropy and Counting
Entropy and Counting
Theorem 1. Fix any parameter , for any , we have
Proof
To prove this theorem, we need to deeply understand the basics of entropy.
Lemma 1. Consider a random variable with finite support , the Shannon entropy of is maximized when , where is a discrete uniform distribution.
Proof. For , let us consider two distributions , (distribution is the set of all probability mass functions over the support). Let be the uniform distribution.
We need to compare the difference between two distributions. There is a useful tool, KL-divergence or relative entropy.
Therefore,
Also, we notice that
Therefore, it is proved.
Now, we can prove Theorem 1.
By the above lemma, we may think of using the uniform distribution to prove the inequality.
First, let us see what the LHS means.
Imagine we have a set , and we uniformly at random select elements.
Apparently, it is LHS. Now, let us think of the problem in another way. What if we select a member uniformly from ?
Let be such a random variable with support
It means that
Therefore, it enough to prove that
Notice that the cardinality of can be large, e.g., . So, it is not obvious that the above is true.
We can use to denote the random variable. If , ; otherwise, .
Notice that adding all ‘0’ or ‘1’, does not change the entropy (the probability is one.)
By the subadditivity of the entropy, we have
Then, it is enough to prove that
It is enough to prove that , which is true.
Appendix
is an increasing function if .
Reference
[1]. Three tutorial lectures on entropy and counting1. David Galvin