Toolbox

C1: Multi-party Communication Model Applications (1): Lower bounds for distributed sketching of maximal matchings and maximal independent sets

Reminder: This post contains 5656 words · 17 min read · by Xianbin

This post is based on [1]. We will see how to prove lower bonds in multi-party communication model using information theory.

We say that a graph \(G=(V, E)\) is an \((s,t)\)-Ruzsa-Szemerédi graph if and only if the set \(E\) can be partitioned into \(t\) induced matchings (there is no edges between endpoints of non-matching edges) each of which has size \(s\).

\(\textbf{Lemma} [2]\) (RS graph). For infinitely many integer \(N\), there are \((s,t)\)-RS graph on \(N\) nodes where \(r = N/e^{\Theta(\sqrt{\log N})}\) and \(t = N/3\).

RS graphs are used a lot in proof of lower bounds. It has an interesting property: sparse locally (disjoint edges), dense globally.

Communication Model

We consider distributed sketching model, or broadcast congested clique (one-round) model. Consider a undirected graph \(G = (V, E)\), each node has a player. Each player of node \(u\) knows

The referee has no input.

Players can simultaneously send a message to the referee and the referee outputs the solution.

So, the round complexity depends on the length of the message sent by players!

Smart readers might find that this is exactly “one-round” lower bound!

A Hard Distribution for Maximal Matching

Roughly speak, the hard instance is a graph \(G = G_1 \cup \cdots \cup G_{k}\) where \(k = N/3, \lvert V(G_i) \rvert = N\). To be a little more specific, we have the following process:

For the second one, we can use the same nodes in \(G_0\) when making the union of all \(k\) \(G_i\).

Therefore, we have two kinds of vertices

More details about \(V_{pub}\). From \(G_0\), we randomly select an induced matching \(M_i, i\in[N/3]\). Let \(j^*\) denote the index. Then we have \(2r\) incident vertices (because the size of matching is \(r\)). We let the rest of \(N-2r\) be the public vertices.

The goal to the above construction is to guarantee the following property. We use \(D_{MM}\) to denote the distribution of graphs by the above process.

\(\textbf{Lemma}\). With probability at least \(1-2^{-kr/10}\), every maximal matching of \(G\sim G_{MM}\) has at least \(\Theta(n/e^{\Theta(\sqrt{\log n})})\) edges whose both endpoints are private nodes.

Let \(N\) denote the nodes in \(G_i\). \(k = N/3\) and \(n = N-2r+2kr\) where \(r = N/e^{\Theta(\sqrt{\log N})}\).

\(\textbf{There are N-2r public vertices}\).

In a word, the construct graph G is like this: \(\color{red}\text{Use } N-2r \text{ nodes as public nodes}\). Then, for each \(G_i\) use \(2r\) as private nodes \((i\geq 1)\). So, the total number of nodes is \(N-2r + 2r\cdot k\).

The Instance

The Lower bound

To prove the lower bound, we can make any algorithm smarter or give algorithms some information. It is totally OK. In the following, we add some extra information (power) to the distributed sketching model.

Note that, the public players have more power than private players.

Preparation of the Proof

Next, it comes to my favorite part.

The most important thing in probability theory and communication theory is to define events, variables correctly and efficiently.

Fix a deterministic protocol \(\pi\) for finding a maximal matching on the graph \(G\) sampled from \(D_{MM}\) with error probability at most 0.01.

We use \(\Pi(P): = \pi(p_1,\ldots, \pi_{N-2r})\) to denote the the collective messages of all \(N-2r\) public players. For each \(G_i\), we use \(\pi(R_i) = \pi(r_{i,1}),\ldots, \pi(r_{i,N})\) to denote the collective messages of each private players. We use \(\Pi(R) := \Pi(R_1),\ldots, \Pi(R_k)\) to denote the messages of all private players, and \(\Pi := \Pi(P), \Pi(R)\) denotes all the messages.

We use \(\Sigma, J\) to denote the variables representing the value of \(\delta, j^*\) in \(D_{MM}\). Recall that we randomly drop edges in \(G_0\) during construction of \(G_i\). Let \(M_{i,j} \in {0,1}^{M_j}\) indicate the states of edges in \(M_j\). For example, \(M_{i,j} = (1,0,1,0,\ldots, 1)\) means that the first edge in \(M_j\) is not removed, the second edge is removed, and so on.

In summary, we have

Overview

Let \(b\) be the number of bits each player communicates to the referee in the worst case.

\(\textbf{Lemma 1}\). \(\textbf{I}(M_{1,J}, M_{2,J},\ldots, M_{k,J} : \Pi \mid \Sigma, J) \geq kr/6\)

\(\textbf{Lemma 2}\). \(\textbf{I}(M_{1,J}, M_{2,J},\ldots, M_{k,J} : \Pi \mid \Sigma, J) \leq \textbf{H}(\Pi(P)) + \sum_{i=1}^k(M_{i,J}: \Pi(R_i) \mid \Sigma, J)\).

\(\textbf{Lemma 3}\). \(\textbf{I}(M_{i,J}: \Pi(R_i) \mid \Sigma, J) \leq \textbf{H}(\Pi(R_i))/k\)

By Lemma 1, Lemma 2 and Lemma 3,

\[kr/6 \leq \textbf{I}(M_{1,J}, M_{2,J},\ldots, M_{k,J} : \Pi \mid \Sigma, J) \leq \textbf{H}(\Pi(P)) + \sum_{i=1}^k \cdot \textbf{H}(\Pi(R_i))/k\\ \leq \lvert P \rvert b+ Nb \leq 2Nb\]

So, \(b\geq \Theta(r)\).

As \(N \geq \Theta(\sqrt{n})\), we get that

\[b \geq \Omega \left(\frac{\sqrt{n}}{e^{\Theta(\sqrt{\log n})}}\right)\]

Exercises:

Prove Lemma 1, Lemma 2 and Lemma 3.

Hints:

Intuitively, Lemma 1 means that to reveal the most of matching that is between private nodes, any transcript needs to know which edge survives the random sampling. The size of matching is \(r\), so basically, it needs to \(\Theta(r)\) edges for each \(G_i\). Then, the size is \(\Theta(kr)\).

Summary and Extension

Actually, the instances of many matching problems are disjoint edges. In this post, if we remove all public nodes, they are disjoint edges.

Reference

[1]. Assadi, S., Kol, G. and Oshman, R., 2020, July. Lower bounds for distributed sketching of maximal matchings and maximal independent sets. In Proceedings of the 39th Symposium on Principles of Distributed Computing (pp. 79-88).

[2]. Ruzsa, I. and Szemerédi, E., 1978. Triple systems with no six points carrying three triangles. Combinatorics (Proceedings of Fifth Hungarian Colloquium, Keszthely, 1976), Vol. II, 18: 939–945, Mathematical Society Colloquium János Bolyai, 18.