Toolbox

C1: Parallel Balls-into-Bins

Reminder: This post contains 4221 words · 13 min read · by Xianbin

This post is based on [1]. I had to say it is comfortable to read those classic papers. Everything is clear. Why not modern researchers use the same style?

The Random Graph Model

First, we rephrase the balls-into-bins problem in terms of a random graph orientation problem . The key idea is that: the lower bounds for the balls-into-bins problem is equivalent to showing a specific subgraph appearing in a random graph, with high probability.

The transformation

The basic process

We associate each bin with a vertex of a graph with nn vertices. We use an edge (u,v)(u,v) to represent a ball, where the vertices u,vu,v represents the two bins chosen by the ball (u,v)(u,v). With each edge, we associate an orientation. The goal is to minimize the maximum in-degree over all vertices of the graph. There could be multiple edges, which corresponds to two balls selecting those same bins. In the case where there are nn balls and nn bins, the corresponding graph is a random graph chosen uniformly from the set of all graphs with nn vertices and nn edges, which we call a random graph Gn,nG_{n,n}.

Communication in the graph

For each round, every ball and bin will decide a larger portion of graph around it. Intuitively, for each round, a ball learns about a larger neighborhood in the graph.

Techniques in proof of lower bounds: Because we work on the lower bound, we can assume that the messages sent by the bins contain all available information whenever. For example, consider an rr round protocol for the balls-into-bins problem, where balls decide the final choice in the rr-th rounds. Then, in this case, we can assume that a ball knows everything about the balls in its (r1)(r-1)-neighborhood and no more before it must decide a bin in the rr-th round.

Assumption 1

Since a ball is an edge, we can construct a subgraph that is split into equivalent two parts whose depth is \ell. For simplicity, we call it left part and right part. Since these two parts are equivalent, the edge (ball) will be confused. Then, this ball can only select a bin with probability 1/2.

A Simple Case

It is enough to show the idea when r=d=2r=d=2 (any ball sends/receives dd messages and the number of rounds is rr). That is to say, we consider that the algorithms that executes at most two rounds, and each ball will use two messages.

Theorem 1\textbf{Theorem 1}. Any nonadaptive, symmetric load distribution algorithms for the balls-into-bins problem with nn balls under assumption 1, and d=r=2d = r = 2, has load at least Ω(logn/loglogn)\Omega(\sqrt{\log n/\log \log n}) with at least constant probability.

Proof of Theorem 1\textbf{Proof of Theorem 1}. To prove Theorem 1, we first show that there exists a symmetric tree with size Θ(logn/loglogn)\Theta(\sqrt{\log n/\log \log n}).

This type of symmetric tree is used to prove the lower bound, so it is not an ordinary XX-ary tree. Recall that we use an edge to represent a bin and a bin will be confused if this bin(edge) connects with two equivalent parts (see the description of Assumption 1). Now, let us think of it a little of further. If many balls(edges) are incident on one bin(vertex), then by constant probability, more than half of them will select the bin, which cause overloaded.

Eureka!

When we consider an algorithm using rr rounds, we will know the maximum depth of the tree constructed. For simplicity, we assume that all internal nodes has the same degree DD.

To make it more clear, we define (D,r)(D, r)-tree as follows.

Definition\textbf{Definition}. A (D,r)(D,r)-tree is a rooted, balanced tree of depth rr, such that the root has degree DD and each internal node has degree D1D-1. We say that a (D,r)(D,r)-tree is isolated in a graph GG if it is in a connected component of GG with no edge multiplicity more than one.

By the basic probability theory, we will have

Lemma 1\textbf{Lemma 1}. A random graph Gn,nG_{n,n} contains an isolated (D,r)(D,r)-tree with r=2r=2 and D=Θ(logn/loglogn)D = \Theta(\sqrt{\log n/\log \log n}).

Proof\textbf{Proof}. Let v=(v0,v1,,vD2){\textbf{v}} = (v_0,v_1,\ldots,v_{D^2}) be a vector of D2+1D^2+1 nodes. We use \(X_}\) be an indicator random variable that is 1 if v0v_0 is the root of an isolated tree (D,2)(D,2)-tree, v1,,vDv_1,\ldots,v_D are the nodes of depth 1, and vD+1,,v2D1v_{D+1},\ldots, v_{2D-1} are the children of v1v_1, and other nodes follow similar rules.

Let X=XvX = \sum X_{\textbf{v}}. By the second moment inequality, we have

P[X=0]1E[X]2E[X2]P[X=0] \leq 1-\frac{E[X]^2}{E[X^2]}
Some Words on E[X]:

We first select D2D^2 nodes making a tree and then among these tree, there are (n1,T,T1,,T1)n \choose {1,T,T-1,\ldots,T-1} possibilities. Then, we can obtain the results.

After computation, we will get that

E[X2]=E[X]+E[X]2(1+o(1)).E[X^2] = E[X] + E[X]^2(1+o(1)).

and

E[X]=n2D2e2D2+2((D1)!)D+1D(1+o(1))E[X] = \frac{n2^{D^2}}{e^{2D^2+2}((D-1)!)^{D+1}D}(1+o(1))

By setting P[X=0]P[X=0] to be some constant and the above two equalities, we prove Lemma 1. By Lemma 1, we prove Theorem 1.

Reference

  1. Adler, Micah, et al. “Parallel randomized load balancing.” Proceedings of the twenty-seventh annual ACM symposium on Theory of computing. 1995.