Toolbox

C1: Distributed Balls-and-Bins

Reminder: This post contains 4098 words · 12 min read · by Xianbin

Balls and bins models are quite useful in a lot of areas. In this post, let us see distributed balls and bins model. This post is based on [1]. (By the way, I really admire one of the authors, i.e., Lenzen, for his work on distributed algorithms).

Model

The system consists of n\textbf{n} bins and n\textbf{n} balls, and we assume this system is fault-free. In each round, the behaviors of balls and bins are as follows.

You can think of the communication graph is a clique!

Besides, all balls and bins have access to unlimited source of unbiased random bits.

Problem 1

Parallel Balls-into-Bins. We put each ball into a bin. The goals are to minimize the total number of rounds until the task is finished, the maximum number of balls in a bin, and the total number of messages.

Algorithm 1

The above algorithm is quite simple. Also, we should notice that the cost is low.

Theorem 2\textbf{Theorem 2}. Algorithm 1 can solve problem 1 within logn\log ^* n rounds with message complexity O(n)O(n).

The Lower Bound

Theorem 1\textbf{Theorem 1}. Algorithm 1 is tight.

That is to say, using O(n)O(n) messages and the maximum load of each bin is LL w.h.p, the lower bound of time complexity for problem 1 is at least lognlogL\log ^*n-\log^* L.

Actually, the above statement is not precise. The scope of algorithms is only on symmetric algorithms in which balls and bins identify each other by uniform, independent and random port numbering.


Thinking before looking at the paper

First, let us try to solve this problem.

Since we only consider symmetric algorithm, then the behavior of the algorithms should be symmetric, i.e., there are a lot of indistinguishable copies of subgraphs.

Also, we know that the maximum load is L .

The question is:

How to use the above fact to prove a lower bound?

The subgraph should like some trees. If the time complexity tt is large, it is hard to control the size of maximum load. In round tt, the depth of the tree that we construct to mimic the progress algorithms is at most ctc^t where cc is some parameter. The number of equivalent trees can be n/dt+1n/d^{t+1} where dd is some parameter. The worst case is that the load is n/dt+1n/d^{t+1}.

Since n/dt+1Ln/d^{t+1}\leq L, we obtain that tlogdnLt \geq \log_{d} \frac{n}{L}.

The conclusion is still not as the same as Theorem 1.

I guess to make the target lower bound, we need to use similar construction in the Algorithm 1. Let X(i)X(i) be the size of a subgraph in the ii round. We should obtain something like X(i+1)=cX(i)X(i+1) = c^{X(i)}. Then, we can show the lower bound.

That is to say, the constructed tree in the (i+1)(i+1)-th round is obtained by a tree in ii-th round contacting other cX(i)c^{X(i)} trees.

There are too many things to prove and define. Now, let us check the answer.


The answer in the paper

It is so hard to understand this paper because of the writing. I believe the key idea is simple.

Definition(Symmetric balls-into-bins)\textbf{Definition}(\text{Symmetric balls-into-bins)}.

An instance of the balls-into-bins problem is symmetric, if balls address bins by port numbering uniformly, independently and randomly. We name algorithms that can be implemented under this assumption to be symmetric.

Key Observation\textbf{Key Observation}. At any point of the execution of a symmetric algorithm, if a ball contacts a bin that it has not contacted yet, the contacted bin is drawn uniformly at random from all bins it has not contacted yet.

This holds when conditioning on arbitrary other events, if these do not constrain the port numbers of bins the ball has not contacted so far.

Definition(Balls-into-bins Graph)\textbf{Definition}(\text{Balls-into-bins Graph}). The bipartite balls-into-bins graph GA(t)G_A(t) associated with an execution of the symmetric algorithm AA that has run for tt rounds as the following.

To be continued ...

Reference

[1]. Lenzen, Christoph, and Roger Wattenhofer. “Tight bounds for parallel randomized load balancing.” Proceedings of the forty-third annual ACM symposium on Theory of computing. 2011.