Reminder: This post contains 4578 words
· 14 min read
· by Xianbin
In the post The Indexing Problem: Different Proofs (1), we see how to use information theory to prove the lower bound of INDEXING. This post gives another method.
VC-Dimension
VC-dimension is a useful tool in machine learning.
Definition (intersection): Let H be a set family and S a set. We define intersection HS as:
HS={h∩S:h∈H}
Definition (shattered): We say that a set S is shattered by H, if HS contains all subsets of S, i.e., ∣HS∣=2∣S∣.
Now, we can define the VC dimension:
VC-dim(H) = max{∣S∣:S is shattered by H}
Example
Imagine a plane, there are three nodes. We say that using a straight line, we can always distinguish all 8 cases. But when there are four nodes, we may not separate two nodes from the other two nodes. Therefore, the maximum number of nodes that we can deal with is 3. Then, VC-dim of a straight line is three.
Yao’s Minimax
Theorem (Yao’s Theorem) For every function f:X×Y→{0,1}, and 0<ϵ<1, we have the following:
RϵA→B,pub(f):maxμ{DϵA→B,μ(f)}
where RϵA→B,pub(f) is the randomized public-coin one-round communication complexity of f with error probability ϵ, and DϵA→B,μ(f) is the minimum cost taken over all deterministic protocol P for which Prμ[P(x,y)=f(x,y)]≤ϵ.
The Second Proof: by VC dimension
First, we will prove the following theorem.
Theorem 2. For every function f:X×Y→{0,1} and 0<ϵ≤1/8, we have
RϵA→B,μ(f)≥Θ(VC-dim(fX))
where fX is the set of function of fx(x∈X).
Proof. First, let us see the main idea of the proof. By Yao’s minimax, it is enough to consider a product distribution μ, we will show that DϵA→B,μ(f)=Ω(d). Then, let us P denote the a single and deterministic protocol for computing f with costs of d/15 bits. We will prove that such a protocol on μ will have a error at least some constant. Then, it is proved. (Someone may be confused when reading [1].)
Let d=VC−dim(fX). By the definition of VC-dimension, there is a set S⊆Y of size d that is shattered by fX. That is, for every subset R⊆S there exists xR∈X, such that for any y∈S, we have fxR(y)=1 iff y∈R, which implies that the function f can distinguish all subsets of S using xR for each R. Let μ denote the uniform distribution over the set of pairs {(xR,y)∣R⊆S,y∈S}.
Recall that P is a deterministic protocol for computing f(x,y) in one round with cost at most d/15 bits.
The expected error of protocol P is
d⋅2d1∑z∈{0,1}ddist(z,P(z))
where dist(⋅,⋅) is the Hamming distance between vectors.
Question: We do not know the protocol P, how to calculate the distance?
Let us see E[dist(z,P(z))] where P(z)⊂{0,1}d.
Lemma. For every d≥15 and any subset S⊂{0,1}d with ∣S∣≤2d/15, we have:
E[dist(z,S)]>d/8 where the expectation is taken over z∈{0,1}d.
Proof. The key is to define a set Ns for each s∈S. We define
Ns={z∣dist(z,s)≤d/4}
We know
∣Ns∣=∑i=1d/4(id)
By the relationship between entropy and counting, see this post Entropy and Counting, we obtain that
∣Ns∣=i=1∑d/4(id)≤2H(1/4)d≤20.82d
Then, we get that
s∑∣Ns∣≤2d/15⋅20.82d<2d⋅2−d/15≤2d−1
Therefore,
E[dist(z,s)]=2d1s∈S∑dist(z,s)≥2d1(2d−1⋅d/4+0)≥d/8.
Then, the error is at least 1/8 with cost d/15. We prove the theorem.
Now, let us apply Theorem 2 to show the lower bound of INDEXING. It is enough to show that the set {1,…,n} is shattered by INDY where IND is the indexing function: {1,…,n}×{0,1}n→{0,1} and INDY={INDy(R)∣R⊆S} and S={1,2,…,n}. We define INDy(R)(i)=1 if and only if i∈R where y(R) is the indicator vector of R.
See, y(R)={y1(R),y2(R),…,yn(R)}. If i∈R, yi(R)=1. Then, INDy(R)(i) is to check the i-th index of the vector y(R), i.e., whether i∈R.
Now, we can see that for any subset R⊂S, there exists y(R) such that INDy(R)(i)=1 if and only if i∈R. By the definition of shattered, S is shattered by INDY.
Finally
To prove the lower bound of INDEXING, we can remove VC-dimension techniques. One of the contribution of the paper [1] is disclosing the relationship between VC-dimension and the lower bounds.
Reference
[1]. Kremer, I., Nisan, N. and Ron, D., 1999. On randomized one-round communication complexity. Computational Complexity, 8, pp.21-49