C1: The Augmented Indexing Problem

December 21, 2024

Reminder: This post contains 2106 words · 7 min read · by Xianbin

Definition
Tools
- Information Cost
- Average Argument
Theorem 1
Reference

The augmented indexing problem is an invariant of the indexing problem, which has applications of lower bounds. This post is base on [1].

Definition

Alice has a string $S$ while Bob has an index $j$ , $S_{<i}$ and a check bit $c\in \{0,1\}$ . The goal is to let Bob output $S_j$ . Namely, Bob needs to know the $j$ -th (index starts from 1) elements in $S$ without giving Alice the index (otherwise, it is trivial.)

Comparing to the indexing problem, now, Bob has a set of elements before the $j$ -th elements, i.e., $S_{<j}$ .

We use AInd to denote the augmented indexing problem. The goal is to output $\text{AInd}(S,j, c):=x_j\oplus c$ .

Tools

Information Cost

Here, we need to consider how much information Bob learns from Alice and Alice learns from Bob. It is called internal information cost.

IC(\Pi) = I(\Pi: A \mid B) + I(\Pi:B\mid A)

On the distribution $\mu \in \{0, 1\}^n \times [n] \times \{0, 1\}$ , we use $IC^A(\Pi)$ to denote the cost of Alice, i.e., $I(\Pi: S \mid S_{<J}, J, C, R)$ and use $IC^B(\Pi)$ to denote the cost of Bob, i.e., $I(\Pi:K, C\mid S, R)$ where $R$ is the public coin.

Average Argument

$\textbf{Lemma 1}$ . Consider a function family $f_1, \ldots, f_L: D\to R^+$ , and parameter $c_1,\ldots,c_L$ ,where $L>0$ is an integer and $D$ is a finite domain. Let $Z$ be a random variable over $D$ . Then, we have the following: $\forall i\in [L], \mathbb{E}[f_i(Z)] \leq c_i$ leads to there exists $z\in D$ such that
$\forall i\in[L], f_i (Z) \leq L c_i$

What does this lemma mean? It means that if a set of functions in a domain have a small average value, then there exists an element in the domain where all functions have a small value.

It is not hard to prove this tiny lemma. Let $g(z):=\sum f_i(z)/b_i$ , and we know that $\mathbb{E}[g(z)]\leq L$ . Then, there must be a $z$ such that $g(z) \leq L$ .

Quite simple but useful. We will see why we will need this lemma soon.

Theorem 1

$\textbf{Theorem 1}$ . If $\Pi$ is a randomized protocol for $\text{Aind}$ with two-sided error at most $1/\log^2 n$ , either $IC^A(\Pi) = \Omega(n)$ , or $IC^B(\Pi) = \Omega(1)$ .

It means that either Alice reveals $\Omega(n)$ bits of her input, or Bob reveals $\Omega(1)$ bits of his input.

To be continued…

Reference

[1]. Chakrabarti, A., Cormode, G., Kondapally, R. and McGregor, A., 2013. Information cost tradeoffs for augmented index and streaming language recognition. SIAM Journal on Computing, 42(1), pp.61-83.