C1: The Indexing Problem: Different Proofs (1)

December 12, 2024

Reminder: This post contains 1979 words · 6 min read · by Xianbin

Problem Definition (INDEXING)
The First Proof
- Tools
- The Proof using Information Theory
Reference

Problem Definition (INDEXING)

Alice has a string $S$ while Bob has an index. The goal is to let Bob output $S_i$ . Namely, Bob needs to know the $i$ -th (index starts from 1) elements in $S$ without giving Alice the index (otherwise, it is trivial.) We know that the lower bound of communication complexity for INDEXING is $\Omega(n)$ .

The First Proof

Tools

Imagin the following scenario: You know $Y$ , and you want to estimate correlated variable $X$ . What can you do? And, what is the wrong probability.

We can see that if $H(X\mid Y)$ is very small, i.e., after knowing $Y$ , we almost know $X$ . Then, we have a small wrong probability.

Fano’s inequality quantifiers the above observation.

$\textbf{Fano's Inequality}$ . Given a Markov chain $X_0 \to Y \to X_1$ Let $p_e = P(X_1 \neq X_0)$ . Then, we have
$H_2(p_e) + p_e\log (\lvert \text{supp}(X_0)\rvert-1) \geq H(X_0\mid X_1) \geq H(X_0 \mid Y)$
where $H_2(p_e)$ is the binary entropy, i.e., $H_2(p_e) = p_e \log \frac{1}{p_e} + (1-p_e) \log \frac{1}{1-p_e}$ .

The Proof using Information Theory

For simplicity, let us consider a uniform distribution on $S \in \{0,1\}^n$ .

Let $\Pi$ be the one-way transcript with $\ell$ bits. Here is a Markov Chain: $S \to \Pi \to S'$ , where $S$ is the string and $S'$ is the set of estimated indexes (a vector).

The information revealed by $\Pi$ is

I(\Pi: S) = H(S) - H(S\mid \Pi) = n - H(S\mid \Pi)

H(S\mid \Pi) = \sum_{i=1}^n H(S_i \mid S_{<i}, \Pi)

How to use Fano’s inequality?

From $S \to \Pi \to S'$ , we know that, $S\mid \Pi \perp S' \mid \Pi$ . Then, we have

S_i \mid \Pi \perp S_i' \mid \Pi

Then, we have a Markov chain $S_i \to M \to S_i'$ .

By Fano’s inequality, we have

H_2(p_e) + p_e \log (\lvert 2 \rvert - 1) \geq H(S_i\mid \Pi)

That is, $H_2(p_e) \geq H(S_i\mid \Pi)$

Just consider $p_e = 1/3$ , we have $H_2(1/3) < 1$ . Then, we have

H(S_i \mid \Pi) < 1.

Therefore, $\sum_{i=1}^n H(S_i \mid S_{<i}, \Pi) \leq \sum_{i=1}^n H(S_i\mid \Pi) \leq cn$ where $c<1$ .

Then, we have $H(S\mid \Pi) \leq c n$

So, we have

I(\Pi: S) \geq \Omega(n)

Also, we know that

I(\Pi: S) \leq H(\Pi) \leq \sum_{i}^\ell H(\Pi[i])\leq \ell

Then, we obtain that

\ell \geq \Omega(n)

Reference

[1]CS 15-859: Algorithms for Big Data Fall 2020 Lecture 11 — 11/19/20 Prof. David Woodruff.