Multicast is clearly an efficient technique for applications with one-to-many communication patterns, but the deployment of in-network multicast has been slow. Therefore, there have been a number of proposals for implementing multicast at the application level, as an overlay over the physical network. This is not as efficient as true in-network multicast (because the same message may be sent over the same link multiple times), but is much more flexible and easier to deploy.
“Scalable Application Layer Multicast” is one such proposal for application-layer multicast via an overlay network; their proposed protocol is called NICE. Their focus is on applications that require low-latency delivery of relatively low-bandwidth data streams to a large set of recipients, although they argue that their techniques are also applicable to high-volume streams with some minor changes.
In NICE, nodes are arranged into layers; a single node can appear in more than one layer. Every node belongs to the lowest layer, L0. The nodes in a layer are arranged (automatically) into clusters of nodes that are “near” to one another (in terms of network distance/latency). Each cluster has a leader, which is the node in the “center” of the cluster (NICE tries to make the leader the node that has the smallest maximal latency to the other nodes in the cluster). Layer L1 consists of all the cluster leaders from L0; the clustering algorithm is applied to L1 in turn, yielding another set of cluster leaders which form L2, and so forth. The height of the tree is determined by the number of nodes and a constant k: each cluster contains between k and 3k-1 nodes.
To multicast a data message, a node forwards the message to every cluster peer in every layer in which the node belongs, except that a node never forwards a message back to the message’s previous hop.
To join the multicast group, a node begins by contacting a designated node called the Rendezvous Point (RP). The RP is typically the root of the NICE tree. The joining node walks down the tree from the root, choosing the child node that is closest to it (lowest latency).
Cluster leaders periodically check whether the cluster size constraint (k <= size <= 3k-1) has been violated; if so, they initiate a cluster merge or split, as appropriate. Splitting a cluster into two clusters is done by trying to minimize the maximum of the radii of the resulting clusters.
All the nodes in a cluster periodically sends heartbeats to each of its cluster peers. This is used to detect node failures, and to update pair-wise latency information for nodes. If a cluster leader fails or deliberately leaves the NICE group, a new leader is chosen by the same heuristic (minimize maximum latency from new center to any cluster peer). A new leader may also be chosen if the pair-wise latencies in the cluster drift sufficiently far to make selecting a new leader justified. Also, each member of every layer i periodically probes its latency to the nodes in layer i+1 — if the node is closer to another i+1 layer node, it moves to the corresponding cluster in layer i.
I didn’t see that the authors provided any grounds for choosing an appropriate k value (essentially the tree fan-in), which seems like it would be an important parameter.