Reminder: This post contains 3510 words
· 11 min read
· by Xianbin
This post give some basic knowledge on entropy.
Conditional Entropy
H(Y∣X)=x∈X∑H(Y∣x)p(x)=−x∈X∑p(x)y∈Y∑p(y∣x)logp(y∣x)=−x∈X∑y∈Y∑p(x,y)logp(y∣x)=−E(logp(Y∣X))
Joint Entropy
Let X,Y be two random variables.
H(X,Y)=−∑x∑yp(x,y)logp(x,y)=−Elogp(X,Y)
To have an intuitive meaning, we first introduce the following formula.
H(X,Y)=H(X)+H(Y)−I(X,Y)
It means that the joint entropy of (X,Y) is the all information we can get from X,Y. Also, we can say that I(X,Y) is the information leaked from each other.
Chain rule for entropy
For two random variables X,Y, we have
H(X,Y)=H(X)+H(Y∣X)=H(Y)+H(X∣Y)
Generally, we have
H(X1,…,Xn)=H(Xn∣Xn−1,…,X1)+⋯+H(X1)
It is easy to prove the above equality.
For any (x1,…,xn) such that p(x1,…,xn)>0, we have
p(x1,…,xn)=p(xn∣xn−1,…,x1)⋯p(x2∣x1)⋅p(x1)
Then,
H(X1,…,Xn)=−ElogP(X1,…,Xn)=−ElogP(Xn∣Xn−1,…,X1)−⋯−ElogP(X1)=H(X1)+⋯+H(Xn∣Xn−1…X1)
Some helpful Inequalities
Lemma 1.. Given two random variables X,Y, we have
H(X)≥H(X∣Y).
Proof.
We prove this as follows.
H(X∣Y)−H(X)=y∑p(y)x∑p(x∣y)logp(x∣y)1−x∑p(x)logp(x)1=y∑p(y)x∑p(x∣y)logp(x∣y)1−x∑p(x)logp(x)1y∑p(y∣x)=x,y∑p(x,y)(logp(x∣y)1−logp(x)1)=x,y∑p(x,y)(logp(x,y)p(x)p(y))
Generalized Jensen’s inequality:
f(E[Z])≤E(f(z))
if f is convex;
f(E[Z])≥E(f(z))
if f is concave.
Here log is a concave function.
Let Z=p(x,y)p(x)p(y). Then,
we obtain that
Z∑p(Z)logZ≤logZ∑Zp(Z)
Then,
x,y∑p(x,y)(logp(x,y)p(x)p(y))≤log(x,y∑p(x,y)p(x)p(y)p(x,y))=0.
Then it is proved.
Some useful inequalities
-
H(A)≤H(A∣C)+H(C).
-
H(A∣B)=p(B=0)⋅H(A∣B=0)+p(B=1)⋅H(A∣B=1) for B∈{0,1}.
For 2, let us see the details.
H(A∣B)=−a,b∼A,B∑p(a,b)logp(a)p(a,b)=∑f(a,b)=p(b=0)∑f(a,b=0)+p(b=1)∑f(a,b=1)=−p(b=0)⋅a,b=0∑p(a,b=0)logp(a)p(a,b=0)−p(b=1)⋅a,b=0∑p(a,b=1)logp(a)p(a,b=0)=P(B=0)⋅H(A∣B=0)+P(B=1)⋅H(A∣B=1)
Lemma H. H(X1,…,Xn)=H(X1)+H(X2∣X1)+⋯=∑i=1nH(Xi∣Xj<i)
where Xj<i=Xi−1,…,X1
Lemm I. I(X1,X2,…,Xn:Y)=I(X1:Y)+I(X2:Y∣X1)+I(X3:Y∣X2,X1)+⋯=H(X1,X2,…,Xn)−H(X1,X2,…,Xn∣Y)=i=1∑nI(Xi:Y∣Xj<i)
Important Inequalities
For entropy,
H(A∣B)≤H(A).
For mutual information, dropping condition does not increase/decrease the value. It depends.
Other Tips
H(A∣B,C)=H(A∣B)
if A⊥C∣B or (C⊥A∣B)
There are a lot of similar tips. Here we show the key by the following proof.
H(A∣B)=H(A)
if A⊥B.
Then,
H(A∣B,C)=H(A∣B∣C)=H(A∣B)
if A⊥C∣B or (C⊥A∣B)