C1: The Augmented Indexing Problem

December 21, 2024

Reminder: This post contains 2106 words · 7 min read · by Xianbin

Definition
Tools
- Information Cost
- Average Argument
Theorem 1
Reference

The augmented indexing problem is an invariant of the indexing problem, which has applications of lower bounds. This post is base on [1].

Definition

Alice has a string \(S\) while Bob has an index \(j\), \(S_{<i}\) and a check bit \(c\in \{0,1\}\). The goal is to let Bob output \(S_j\). Namely, Bob needs to know the \(j\)-th (index starts from 1) elements in \(S\) without giving Alice the index (otherwise, it is trivial.)

Comparing to the indexing problem, now, Bob has a set of elements before the \(j\)-th elements, i.e., \(S_{<j}\).

We use AInd to denote the augmented indexing problem. The goal is to output \(\text{AInd}(S,j, c):=x_j\oplus c\).

Tools

Information Cost

Here, we need to consider how much information Bob learns from Alice and Alice learns from Bob. It is called internal information cost.

\[IC(\Pi) = I(\Pi: A \mid B) + I(\Pi:B\mid A)\]

On the distribution \(\mu \in \{0, 1\}^n \times [n] \times \{0, 1\}\), we use \(IC^A(\Pi)\) to denote the cost of Alice, i.e., \(I(\Pi: S \mid S_{<J}, J, C, R)\) and use \(IC^B(\Pi)\) to denote the cost of Bob, i.e., \(I(\Pi:K, C\mid S, R)\) where \(R\) is the public coin.

Average Argument

\(\textbf{Lemma 1}\). Consider a function family \(f_1, \ldots, f_L: D\to R^+\), and parameter \(c_1,\ldots,c_L\),where \(L>0\) is an integer and \(D\) is a finite domain. Let \(Z\) be a random variable over \(D\). Then, we have the following: \(\forall i\in [L], \mathbb{E}[f_i(Z)] \leq c_i\) leads to there exists \(z\in D\) such that
\[\forall i\in[L], f_i (Z) \leq L c_i\]

What does this lemma mean? It means that if a set of functions in a domain have a small average value, then there exists an element in the domain where all functions have a small value.

It is not hard to prove this tiny lemma. Let \(g(z):=\sum f_i(z)/b_i\), and we know that \(\mathbb{E}[g(z)]\leq L\). Then, there must be a \(z\) such that \(g(z) \leq L\).

Quite simple but useful. We will see why we will need this lemma soon.

Theorem 1

\(\textbf{Theorem 1}\). If \(\Pi\) is a randomized protocol for \(\text{Aind}\) with two-sided error at most \(1/\log^2 n\), either \(IC^A(\Pi) = \Omega(n)\), or \(IC^B(\Pi) = \Omega(1)\).

It means that either Alice reveals \(\Omega(n)\) bits of her input, or Bob reveals \(\Omega(1)\) bits of his input.

To be continued…

Reference

[1]. Chakrabarti, A., Cormode, G., Kondapally, R. and McGregor, A., 2013. Information cost tradeoffs for augmented index and streaming language recognition. SIAM Journal on Computing, 42(1), pp.61-83.