T1: The Corruption Method in Communication Complexity (1)

January 09, 2025

Reminder: This post contains 4186 words · 12 min read · by Xianbin

Corruption Methods
DISJ (product distribution )
- Revisit what we proved
Reference

This post revisits previous posts and talks about the usage of the corruption method in communication complexity.

Corruption Methods

Example 1

1	1	1	0
1	1	1	1
1	0	1	0

Example 2

0	0	0
0	0	0
1	0	1
1	1	0

Corruption means that in a rectangle, most of elements are 1 (with noise 0) or 0 (with noise 1). We say such rectangles are almost monochromatic rectangles.

DISJ (product distribution )

Now, we see how to embed the idea of corruption bound to prove the lower bound.

Let \(M(f)\) be the matrix in which rows are possible \(x\) and columns are possible \(y\), \(M(f) = M(x,y)\).

\(\textbf{Theorem 1}\) If \(f\) is a function that every partition of \(M(f)\) into monochromatic rectangles requires \(r\) rectangles, then the communication complexity of \(f\) is at least \(\log r\).

The proof is simple. The number of leaves in the decision tree is at least the number of monochromatic rectangles and \(\log (\# leaves)\) is the lower bound.

Now, we consider the communication complexity of DISJ under product distribution. The distribution \(\mu\) is as follows: we select \(\sqrt{n}\) elements uniformly at random from \([n]\) to be \(X\) and \(Y\) is created in the same way. We consider the protocol with at most \(\epsilon\) error. We say that \(D^\mu_{\epsilon}(f)\) is the cost of the best deterministic protocol that gives correct answers to at least \(1-\epsilon\) fraction of the input defined by \(\mu\).

Corresponding to \(D^{\mu}_\epsilon(f)\), we define \(\epsilon\)-monochromatic rectangles \(R= S\times T\) in terms of DISJ as follows: \(P[\text{DISJ}(x,y) = 0 \mid (x,y)\in R] \leq \epsilon\).

The main idea is to prove that \(\lvert S \rvert \lvert T \rvert \leq 2^{-c'\sqrt{n}} \lvert X \rvert \lvert Y \rvert\) where \(\lvert X \rvert = \lvert Y \rvert = {n \choose \sqrt{n}}\). Then, there are at least \(2^{c'\sqrt{n}}\) \(\epsilon\)-monochromatic rectangles \(R\). So, the lower bound is \(\Omega(\sqrt{n})\).

Let \(S^*= \{x: x\in S, \lvert x \cap y \rvert \leq 2\epsilon \lvert T \rvert\}\). By the averaging argument and \(S\times T\) is an \(\epsilon\)-rectangle, \(\lvert S^* \rvert \geq \lvert S \rvert /2\).

\(\textbf{Lemma 1}\). If \(\lvert S \rvert \geq \lvert X \rvert 2^{-c\sqrt{n}}\), then there exist \(x_1,\ldots, x_k \in S\) such that \(k= \sqrt{n}/3\) and for every \(j\leq k\), we have

\[\lvert x_j \bigcup_{i<j}x_i \rvert < \sqrt{n}/2\]

The above lemma is obvious. Just imaging the process of adding \(x_j\), there always will many choices.

\(\textbf{Lemma 2}\). Given any \(x_1, \ldots,x_k \in S^*\), at most \(\lvert T \rvert /2\) of the element of \(T\) intersect more than \(4\epsilon k\) of the \(x_i\).

\(\textbf{Proof}\). If there are more than \(\lvert T \rvert /2\) of elements of \(T\) intersect more than \(4\epsilon k\) of the \(x_i\) in \(S^*\), then it will violate the definition of \(S^*\).

By Lemma 2, we can get an estimation of the upper bound of \(\lvert T \rvert\). First, we consider the upper bound of \(\lvert T \rvert /2\). There are at most \(k\choose 4\epsilon k\) ways of selecting \(4\epsilon\) of the \(x_i\) which a given \(y\in T\) is allowed to intersect. The size of remaining \(x_i\) is at least \(k(1-4\epsilon)\sqrt{n}/2 \geq n/9\). For those remaining \(x_i\), we cannot let \(y\) intersect with them (otherwise, the intersection rate will be larger than \(4\epsilon\)). Then, we choose from \(n-n/9=8n/9\) elements.

So,

\[\lvert T \rvert /2 \leq {k \choose 4\epsilon k} {8n/9 \choose \sqrt{n}} \leq 2^{-c\sqrt{n}}{n \choose \sqrt{n}}\]

Let us prove the above inequality.

The Stirling approximation: when \(n\to \infin\), \(n! \to \frac{n^n \sqrt{2\pi n}}{e^n}\).

Then, we have

\[{8n/9 \choose \sqrt{n}} \to e^{\frac{8\sqrt{n}}{9}}\] \[{n \choose \sqrt{n}} \to e^{\sqrt{n}}\] \[{k \choose 4\epsilon k} \to e^{\epsilon k} = e^{\epsilon' \sqrt{n}}\]

Then, it is proved.

Now, we will give an upper bound of \(\lvert R \rvert\). If the condition in Lemma 1 fails, we can obtain the desired upper bound. Otherwise, \(\lvert T \rvert \leq 2^{-c'\sqrt{n}}{n \choose \sqrt{n}}\)

Then, we have

\[\lvert R \rvert \leq \lvert X \rvert \lvert Y \rvert 2^{-c'\sqrt{n}}.\]

There are at least \(2^{c'\sqrt{n}}\) rectangles, so it is proved.

(Notice that the size of the set \(S^*\) is not \(k\)!)

Revisit what we proved

\(\textbf{Lemma 3}\). There exits a constant \(c\) such that the following holds. If \(R = S \times T\) is an almost \(\epsilon\)-monochromatic 1-rectangle, i.e.,
\[\mathbb{P}_{X,Y \sim \mu} [\text{DISJ}(X,Y) = 0 \mid (X,Y)\in R] \leq \epsilon\]
then, either \(\lvert S \rvert\) or \(\lvert T\rvert\) is at most \(2^{-c\sqrt{n}}{n \choose \sqrt{n}}\).

In the next post, we show how to prove the theorem of corruption bound.

Reference

[1]. Laszlo Babai, Peter Frankl, and Janos Simon. Complexity classes in communication complexity theory. In Proceedings of the 27th Annual Symposium on Foundations of Computer Science, FOCS’86, pages 337–347, 1986.

[2]. Xianbin’s post 1.