Toolbox

C1: The Indexing Problem: Different Proofs (2)

Reminder: This post contains 4578 words · 14 min read · by Xianbin

In the post The Indexing Problem: Different Proofs (1), we see how to use information theory to prove the lower bound of INDEXING. This post gives another method.

Tools

VC-Dimension

VC-dimension is a useful tool in machine learning.

Definition (intersection): Let HH be a set family and SS a set. We define intersection\textit{intersection} HSH_S as:

HS={hS:hH}H_S=\{ h\cap S : h\in H\}

Definition (shattered): We say that a set SS is shattered by HH, if HSH_S contains all subsets of SS, i.e., HS=2S\lvert H_S \rvert = 2^{\lvert S\vert}.

Now, we can define the VC dimension:

VC-dim(HH) = max{S:S is shattered by H}\max\{ \lvert S \rvert: S \text{ is shattered by } H \}

Example

Imagine a plane, there are three nodes. We say that using a straight line, we can always distinguish all 8 cases. But when there are four nodes, we may not separate two nodes from the other two nodes. Therefore, the maximum number of nodes that we can deal with is 3. Then, VC-dim of a straight line is three.

Yao’s Minimax

Theorem\textbf{Theorem} (Yao’s Theorem) For every function fX×Y{0,1}f :X \times Y \to \{ 0, 1\}, and 0<ϵ<10<\epsilon < 1, we have the following:

RϵAB,pub(f):maxμ{DϵAB,μ(f)}R^{A\to B, pub}_\epsilon(f): \max_\mu\{D_\epsilon^{A\to B, \mu}(f)\} where RϵAB,pub(f)R^{A\to B, pub}_\epsilon(f) is the randomized public-coin one-round communication complexity of ff with error probability ϵ\epsilon, and DϵAB,μ(f)D_\epsilon^{A\to B, \mu}(f) is the minimum cost taken over all deterministic protocol PP for which Prμ[P(x,y)f(x,y)]ϵPr_\mu[P(x,y) \neq f(x,y)] \leq \epsilon.

The Second Proof: by VC dimension

First, we will prove the following theorem.

Theorem 2\textbf{Theorem 2}. For every function f:X×Y{0,1}f: X \times Y \to \{0, 1\} and 0<ϵ1/80<\epsilon \leq 1/8, we have

RϵAB,μ(f)Θ(VC-dim(fX))R^{A\to B, \mu}_\epsilon(f)\geq \Theta(\text{VC-dim}(f_X))

where fXf_X is the set of function of fx(xX)f_x (x\in X).

Proof\textbf{Proof}. First, let us see the main idea of the proof. By Yao’s minimax, it is enough to consider a product distribution μ\mu, we will show that DϵAB,μ(f)=Ω(d)D_{\epsilon}^{A\to B, \mu}(f) =\Omega(d). Then, let us PP denote the a single and deterministic protocol for computing ff with costs of d/15d/15 bits. We will prove that such a protocol on μ\mu will have a error at least some constant. Then, it is proved. (Someone may be confused when reading [1].)

Let d=VCdim(fX)d = VC-dim(f_X). By the definition of VC-dimension, there is a set SYS \subseteq Y of size dd that is shattered by fXf_X. That is, for every subset RSR\subseteq S there exists xRXx_R \in X, such that for any ySy\in S, we have fxR(y)=1f_{x_R}(y) = 1 iff yRy\in R, which implies that the function ff can distinguish all subsets of SS using xRx_R for each RR. Let μ\mu denote the uniform distribution over the set of pairs {(xR,y)RS,yS}\{(x_R,y)\mid R\subseteq S, y\in S \}.

Recall that PP is a deterministic protocol for computing f(x,y)f(x,y) in one round with cost at most d/15d/15 bits.

The expected error of protocol PP is 1d2dz{0,1}ddist(z,P(z))\frac{1}{d\cdot 2^d} \sum_{z\in \{0,1\}^d}\text{dist}(z, P(z)) where dist(,)\text{dist}(\cdot,\cdot) is the Hamming distance between vectors.

Question: We do not know the protocol PP, how to calculate the distance?

Let us see E[dist(z,P(z))]\mathbb{E}[\text{dist}(z,P(z))] where P(z){0,1}dP(z)\subset \{0,1\}^d.

Lemma\textbf{Lemma}. For every d15d\geq 15 and any subset S{0,1}dS\subset \{0,1\}^d with S2d/15\vert S\rvert \leq 2^{d/15}, we have: E[dist(z,S)]>d/8\mathbb{E}[\text{dist}(z, S)] > d/8 where the expectation is taken over z{0,1}dz\in \{0,1\}^d.

Proof\textbf{Proof}. The key is to define a set NsN_s for each sSs\in S. We define Ns={zdist(z,s)d/4}N_s = \{z \mid \text{dist}(z,s) \leq d/4\}

We know Ns=i=1d/4(di)\lvert N_s \rvert = \sum_{i=1}^{d/4} {d \choose i}

By the relationship between entropy and counting, see this post Entropy and Counting, we obtain that

Ns=i=1d/4(di)2H(1/4)d20.82d\lvert N_s \rvert = \sum_{i=1}^{d/4} {d \choose i} \leq 2^{H(1/4) d}\leq 2^{0.82d}

Then, we get that

sNs2d/1520.82d<2d2d/152d1\sum_s \lvert N_s \rvert \leq 2^{d/15}\cdot 2^{0.82d} < 2^{d}\cdot 2^{-d/15} \leq 2^{d-1}

Therefore,

E[dist(z,s)]=12dsSdist(z,s)12d(2d1d/4+0)d/8.\mathbb{E}[\text{dist}(z,s)] = \frac{1}{2^d}\sum_{s\in S} \text{dist}(z,s) \geq \frac{1}{2^d}(2^{d-1}\cdot d/4 + 0) \geq d/8.

Then, the error is at least 1/81/8 with cost d/15d/15. We prove the theorem.

Now, let us apply Theorem 2 to show the lower bound of INDEXING. It is enough to show that the set {1,,n}\{1,\ldots,n\} is shattered by INDY\text{IND}_Y where IND\text{IND} is the indexing function: {1,,n}×{0,1}n{0,1}\{1,\ldots, n\} \times \{0,1\}^n \to \{0,1 \} and INDY={INDy(R)RS}\text{IND}_Y = \{\text{IND}_{y(R)} \mid R \subseteq S \} and S={1,2,,n}S = \{1,2,\ldots,n\}. We define INDy(R)(i)=1\text{IND}_{y(R)} (i) = 1 if and only if iRi\in R where y(R)y(R) is the indicator vector of RR.

See, y(R)={y1(R),y2(R),,yn(R)}y(R) = \{y_1(R), y_2(R), \ldots, y_n(R)\}. If iRi\in R, yi(R)=1y_i(R) = 1. Then, INDy(R)(i)\text{IND}_{y(R)}(i) is to check the ii-th index of the vector y(R)y(R), i.e., whether iRi\in R.

Now, we can see that for any subset RSR\subset S, there exists y(R)y(R) such that INDy(R)(i)=1\text{IND}_{y(R)}(i) = 1 if and only if iRi\in R. By the definition of shattered, SS is shattered by INDY\text{IND}_Y.

Finally

To prove the lower bound of INDEXING, we can remove VC-dimension techniques. One of the contribution of the paper [1] is disclosing the relationship between VC-dimension and the lower bounds.

Reference

[1]. Kremer, I., Nisan, N. and Ron, D., 1999. On randomized one-round communication complexity. Computational Complexity, 8, pp.21-49