Reminder: This post contains 4578 words
· 14 min read
· by Xianbin
In the post The Indexing Problem: Different Proofs (1), we see how to use information theory to prove the lower bound of INDEXING. This post gives another method.
VC-Dimension
VC-dimension is a useful tool in machine learning.
Definition (intersection): Let \(H\) be a set family and \(S\) a set. We define \(\textit{intersection}\) \(H_S\) as:
\[H_S=\{ h\cap S : h\in H\}\]
Definition (shattered): We say that a set \(S\) is shattered by \(H\), if \(H_S\) contains all subsets of \(S\), i.e., \(\lvert H_S \rvert = 2^{\lvert S\vert}\).
Now, we can define the VC dimension:
VC-dim(\(H\)) = \(\max\{ \lvert S \rvert: S \text{ is shattered by } H \}\)
Example
Imagine a plane, there are three nodes. We say that using a straight line, we can always distinguish all 8 cases. But when there are four nodes, we may not separate two nodes from the other two nodes. Therefore, the maximum number of nodes that we can deal with is 3. Then, VC-dim of a straight line is three.
Yao’s Minimax
\(\textbf{Theorem}\) (Yao’s Theorem) For every function \(f :X \times Y \to \{ 0, 1\}\), and \(0<\epsilon < 1\), we have the following:
\(R^{A\to B, pub}_\epsilon(f): \max_\mu\{D_\epsilon^{A\to B, \mu}(f)\}\)
where \(R^{A\to B, pub}_\epsilon(f)\) is the randomized public-coin one-round communication complexity of \(f\) with error probability \(\epsilon\), and \(D_\epsilon^{A\to B, \mu}(f)\) is the minimum cost taken over all deterministic protocol \(P\) for which \(Pr_\mu[P(x,y) \neq f(x,y)] \leq \epsilon\).
The Second Proof: by VC dimension
First, we will prove the following theorem.
\(\textbf{Theorem 2}\). For every function \(f: X \times Y \to \{0, 1\}\) and \(0<\epsilon \leq 1/8\), we have
\[R^{A\to B, \mu}_\epsilon(f)\geq \Theta(\text{VC-dim}(f_X))\]
where \(f_X\) is the set of function of \(f_x (x\in X)\).
\(\textbf{Proof}\). First, let us see the main idea of the proof. By Yao’s minimax, it is enough to consider a product distribution \(\mu\), we will show that \(D_{\epsilon}^{A\to B, \mu}(f) =\Omega(d)\). Then, let us \(P\) denote the a single and deterministic protocol for computing \(f\) with costs of \(d/15\) bits. We will prove that such a protocol on \(\mu\) will have a error at least some constant. Then, it is proved. (Someone may be confused when reading [1].)
Let \(d = VC-dim(f_X)\). By the definition of VC-dimension, there is a set \(S \subseteq Y\) of size \(d\) that is shattered by \(f_X\). That is, for every subset \(R\subseteq S\) there exists \(x_R \in X\), such that for any \(y\in S\), we have \(f_{x_R}(y) = 1\) iff \(y\in R\), which implies that the function \(f\) can distinguish all subsets of \(S\) using \(x_R\) for each \(R\). Let \(\mu\) denote the uniform distribution over the set of pairs \(\{(x_R,y)\mid R\subseteq S, y\in S \}\).
Recall that \(P\) is a deterministic protocol for computing \(f(x,y)\) in one round with cost at most \(d/15\) bits.
The expected error of protocol \(P\) is
\(\frac{1}{d\cdot 2^d} \sum_{z\in \{0,1\}^d}\text{dist}(z, P(z))\)
where \(\text{dist}(\cdot,\cdot)\) is the Hamming distance between vectors.
Question: We do not know the protocol \(P\), how to calculate the distance?
Let us see \(\mathbb{E}[\text{dist}(z,P(z))]\) where \(P(z)\subset \{0,1\}^d\).
\(\textbf{Lemma}\). For every \(d\geq 15\) and any subset \(S\subset \{0,1\}^d\) with \(\vert S\rvert \leq 2^{d/15}\), we have:
\(\mathbb{E}[\text{dist}(z, S)] > d/8\) where the expectation is taken over \(z\in \{0,1\}^d\).
\(\textbf{Proof}\). The key is to define a set \(N_s\) for each \(s\in S\). We define
\(N_s = \{z \mid \text{dist}(z,s) \leq d/4\}\)
We know
\(\lvert N_s \rvert = \sum_{i=1}^{d/4} {d \choose i}\)
By the relationship between entropy and counting, see this post Entropy and Counting, we obtain that
\[\lvert N_s \rvert = \sum_{i=1}^{d/4} {d \choose i} \leq 2^{H(1/4) d}\leq 2^{0.82d}\]
Then, we get that
\[\sum_s \lvert N_s \rvert \leq 2^{d/15}\cdot 2^{0.82d} < 2^{d}\cdot 2^{-d/15} \leq 2^{d-1}\]
Therefore,
\[\mathbb{E}[\text{dist}(z,s)] = \frac{1}{2^d}\sum_{s\in S} \text{dist}(z,s) \geq \frac{1}{2^d}(2^{d-1}\cdot d/4 + 0) \geq d/8.\]
Then, the error is at least \(1/8\) with cost \(d/15\). We prove the theorem.
Now, let us apply Theorem 2 to show the lower bound of INDEXING. It is enough to show that the set \(\{1,\ldots,n\}\) is shattered by \(\text{IND}_Y\) where \(\text{IND}\) is the indexing function: \(\{1,\ldots, n\} \times \{0,1\}^n \to \{0,1 \}\) and \(\text{IND}_Y = \{\text{IND}_{y(R)} \mid R \subseteq S \}\) and \(S = \{1,2,\ldots,n\}\). We define \(\text{IND}_{y(R)} (i) = 1\) if and only if \(i\in R\) where \(y(R)\) is the indicator vector of \(R\).
See, \(y(R) = \{y_1(R), y_2(R), \ldots, y_n(R)\}\). If \(i\in R\), \(y_i(R) = 1\). Then, \(\text{IND}_{y(R)}(i)\) is to check the \(i\)-th index of the vector \(y(R)\), i.e., whether \(i\in R\).
Now, we can see that for any subset \(R\subset S\), there exists \(y(R)\) such that \(\text{IND}_{y(R)}(i) = 1\) if and only if \(i\in R\). By the definition of shattered, \(S\) is shattered by \(\text{IND}_Y\).
Finally
To prove the lower bound of INDEXING, we can remove VC-dimension techniques. One of the contribution of the paper [1] is disclosing the relationship between VC-dimension and the lower bounds.
Reference
[1]. Kremer, I., Nisan, N. and Ron, D., 1999. On randomized one-round communication complexity. Computational Complexity, 8, pp.21-49