Toolbox

C1: The Indexing Problem: Different Proofs (1)

Reminder: This post contains 1979 words · 6 min read · by Xianbin

Problem Definition (INDEXING)

Alice has a string SS while Bob has an index. The goal is to let Bob output SiS_i. Namely, Bob needs to know the ii-th (index starts from 1) elements in SS without giving Alice the index (otherwise, it is trivial.) We know that the lower bound of communication complexity for INDEXING is Ω(n)\Omega(n).

The First Proof

Tools

Imagin the following scenario: You know YY, and you want to estimate correlated variable XX. What can you do? And, what is the wrong probability.

We can see that if H(XY)H(X\mid Y) is very small, i.e., after knowing YY, we almost know XX. Then, we have a small wrong probability.

Fano’s inequality quantifiers the above observation.

Fano’s Inequality\textbf{Fano's Inequality}. Given a Markov chain X0YX1X_0 \to Y \to X_1 Let pe=P(X1X0)p_e = P(X_1 \neq X_0). Then, we have

H2(pe)+pelog(supp(X0)1)H(X0X1)H(X0Y)H_2(p_e) + p_e\log (\lvert \text{supp}(X_0)\rvert-1) \geq H(X_0\mid X_1) \geq H(X_0 \mid Y)

where H2(pe)H_2(p_e) is the binary entropy, i.e., H2(pe)=pelog1pe+(1pe)log11peH_2(p_e) = p_e \log \frac{1}{p_e} + (1-p_e) \log \frac{1}{1-p_e}.

The Proof using Information Theory

For simplicity, let us consider a uniform distribution on S{0,1}nS \in \{0,1\}^n.

Let Π\Pi be the one-way transcript with \ell bits. Here is a Markov Chain: SΠSS \to \Pi \to S', where SS is the string and SS' is the set of estimated indexes (a vector).

The information revealed by Π\Pi is

I(Π:S)=H(S)H(SΠ)=nH(SΠ)I(\Pi: S) = H(S) - H(S\mid \Pi) = n - H(S\mid \Pi) H(SΠ)=i=1nH(SiS<i,Π)H(S\mid \Pi) = \sum_{i=1}^n H(S_i \mid S_{<i}, \Pi)

How to use Fano’s inequality?

From SΠSS \to \Pi \to S', we know that, SΠSΠS\mid \Pi \perp S' \mid \Pi. Then, we have

SiΠSiΠS_i \mid \Pi \perp S_i' \mid \Pi

Then, we have a Markov chain SiMSiS_i \to M \to S_i'.

By Fano’s inequality, we have

H2(pe)+pelog(21)H(SiΠ)H_2(p_e) + p_e \log (\lvert 2 \rvert - 1) \geq H(S_i\mid \Pi)

That is, H2(pe)H(SiΠ)H_2(p_e) \geq H(S_i\mid \Pi)

Just consider pe=1/3p_e = 1/3, we have H2(1/3)<1H_2(1/3) < 1. Then, we have

H(SiΠ)<1.H(S_i \mid \Pi) < 1.

Therefore, i=1nH(SiS<i,Π)i=1nH(SiΠ)cn\sum_{i=1}^n H(S_i \mid S_{<i}, \Pi) \leq \sum_{i=1}^n H(S_i\mid \Pi) \leq cn where c<1c<1.

Then, we have H(SΠ)cnH(S\mid \Pi) \leq c n

So, we have

I(Π:S)Ω(n)I(\Pi: S) \geq \Omega(n)

Also, we know that

I(Π:S)H(Π)iH(Π[i])I(\Pi: S) \leq H(\Pi) \leq \sum_{i}^\ell H(\Pi[i])\leq \ell

Then, we obtain that

Ω(n)\ell \geq \Omega(n)

Reference

[1]CS 15-859: Algorithms for Big Data Fall 2020 Lecture 11 — 11/19/20 Prof. David Woodruff.