C1: Distributed Computation in Node-Capacitated Networks

June 27, 2024

Reminder: This post contains 3515 words · 11 min read · by Xianbin

Definitions
Basic Operations
Minimum Spanning Tree
- Quick Review of Bruvka’s MST Algorithm
- The Distributed Algorithm
The Connection to the $k$ -machine model
Reference

This post is based on [1].

Definitions

$\textbf{NCC model}$ . Let $G=(V, E)$ be an undirected graph. For the set of $V$ of $n$ nodes, each node has a unique ID of $O(\log n)$ bits and knows $n$ and all IDs. Without loss of generality, the IDs are form the $\{0,1,n-1\}$ . The communication is synchronous and in each round the cost of local computation is ignored. Each node can send distinct messages of $O(\log n)$ to $O(\log n)$ other nodes (The communication graph is a clique, i.e., there is a link between any pair of nodes). If more messages are sent to a node, it can pick up any subset of $O(\log n)$ messages. Additional messages are dropped. Initially, each node locally knows the IDs of its neighbors (also weights of incident edges in weighted graphs), but has no further knowledge of the graph.

Basic Operations

1. Aggregation Problem.

Given a distributive aggregate function $f$ and a set of aggregation groups $\mathcal{A} = \{A_1,\ldots,A_N\}$ where $A_i$ is a subset of $V$ , with targets $t_1,\ldots,t_N \in V$ where each node has exactly input value $s^i_u$ for each $u\in A_i$ . The goal is to aggregate the input values so that eventually target nodes $t_i$ knows $f(s^i_u \mid u\in A_i)$ for all $i$ .

$\textbf{Theorem 1}$ . There exists an Aggregation Algorithm that solves any Aggregation Problem in time $(L/n + (\ell_1 + \ell_2)/n + \log n)$ where $L = \sum_{i=1}^N \lvert A_i \rvert$ and $\ell_1$ the congestion size of $A_i$ and $\ell_2$ is the congestion size of $t_i$ .

I will not disclose the details of the algorithm. The main idea is using broadcast. The intuitive explanation is as follows. Each node will be used for each round. As the bandwidth is almost linear, $n$ nodes can help solve $n$ aggregation, which leads to the first component $L/n$ . The $\ell_1, \ell_2$ is arisen from local congestions.

2. Multicast Tree Setup Problem

Given a set of multicast groups $\mathcal{A} = \{A_1,\ldots,A_N\}$ , $A_i \subseteq V$ , with sources $\textcolor{red}{\{s_1,\ldots, S_N\in V\}}$ . such that each node is a source of at most one multicast group. The goal is to set up a multicast tree $T_i$ in the butterfly (a graph showing the process of broadcast, see [1]) for each $i\in[N]$ with root $h(i)$ ( which is a node uniformly and independently selected among the nodes in the bottommost level of the butterfly) and a unique and randomly chosen leaf $l(i,u)$ in the topmost level for each node $u\in A_i$ .

This problem is defined weird in [1] because the sources are not used. The purpose of introducing multicast tree setup problem is for the following problem.

Multicast Problem

It is sort of inverse of the aggregation problem. After we setup a multicast tree, our goal is to let each source $s_i$ send a message to all nodes $u\in A_i$ . Let $C$ be the congestion of the multicast trees (maximum number of trees that share the same butterfly node).

Then, we have

$\textbf{Theorem 2}$ . There is a multicast algorithm that solves any Multicast Problem in $O(C + \ell_1/\log n + \log n)$ w.h.p.

Minimum Spanning Tree

For most of distributed MST algorithm, they are adapted from Bruvka’s algorithm (because it is an inherent distributed algorithm).

Quick Review of Bruvka’s MST Algorithm

First, each node is a component(tree). Each component finds the lightest edge to outside. Repeat until we have a spanning tree.

The Distributed Algorithm

Now, let us see how to implement the above algorithm into NCC model. The major task is to find the lightest edge for each component $C$ . Each node (source) in $C$ computes the minimum value from all neighbors via the mutlicast and aggregation method. Then, we let one node (e.g., the leader node) know this value. It is not hard to see that algorithm will be finished within polylog n rounds.

The Connection to the $k$ -machine model

$\textbf{Lemma}$ . Any NCC algorithm using $T$ rounds, will have at most $\tilde O(nT/k^2)$ rounds in the $k$ -machine model.

Reference

[1] Augustine, John, et al. “Distributed computation in node-capacitated networks.” The 31st ACM Symposium on Parallelism in Algorithms and Architectures. 2019.