Tag Archives: mesh network

“ExOr: Opportunistic Multi-Hop Routing for Wireless Networks”

Wired networks are traditionally viewed as a unicast medium: packets are sent from A to B. Because the underlying medium can have a degree of sharing (e.g. multiple hosts connected to the same Ethernet hub), steps must be taken to avoid interference between concurrent senders (e.g. CSMA).

In a wireless network, the medium is shared: any hosts in range of a radio transmission can receive it. If the goal is to use a traditional unicast routing protocol over a wireless link, this makes interference between senders a more challenging problem (e.g. as discussed in the MACAW paper). However, rather than viewing broadcast as an inconvenience we need to work around, the broadcast nature of wireless can also be leveraged to design new protocols that would be infeasible for a wired network.

A nice example of a broadcast-oriented protocol for wireless is “ExOr: Opportunistic Multi-Hop Routing for Wireless Networks“. In a traditional routing design, a sender chooses the next hop that should receive a packet, and then unicasts the packet to that destination. In ExOr, a sender broadcasts a batch of packets to a group of nodes simultaneously. The set of batch recipients coordinate to try to ensure that each packet in the batch is forwarded onward. The recipient of a packet is only chosen after the packet is sent, which allows ExOr to “opportunistically” take advantage of links that have high loss rates. ExOr assumes that node reception probabilities are mostly independent and mostly decrease with distance — both of which are probably pretty reasonable assumptions.


The source collects a batch of packets destined for the same host, and then chooses a forwarding list for the batch. The forwarding list is a list of nodes, sorted by the expected cost of delivering packets from that node to the eventual destination node. The cost metric is similar to ETX (expected number of transmissions required to send the packet to the destination via unicast, including retransmissions); unlike ETX, it only considers the forward delivery probability. While it would be possible to including all possible recipient nodes in the forwarding list, this would increase the coordination cost among the forwarders, so ExOr only includes “likely” recipients in the list (estimated 10%+ chance of receiving a broadcast packet).

Each packet contains a batch map, which holds the sender’s estimate of the highest priority (according to the cost metric) node to have received each packet in the batch. When a node receives a packet, it uses the packet’s batch map to update its local batch map. This means that batch map information propagates through the nodes, carrying information about packet reception from high priority nodes to lower priority nodes.

After the source broadcasts a batch, each member of the forwarding list broadcasts, ordered by descending priority (ETX value). Each node broadcasts the packets it received, along with its updated batch map. Nodes coordinate to schedule their transmissions to try to avoid interference (e.g. by estimating when each node’s expected transmission time is, according to ETX value and batch map contents). The protocol continues cycling through the nodes in priority order. At the end of each cycle, the ultimate destination broadcasts its batch map 10 times; at the beginning of each cycle, the source resends packets that weren’t received by any node (by observing batch map contents). The ExOr scheme stops when 90% of the packets in a batch have been transferred, and uses a traditional routing policy to deliver the remainder of the batch.


Overall, ExOr is a really neat idea. That said, it is at best a special-purpose solution, because of the high latency it incurs for batching and multiple transmission rounds. Similarly, the need for a split TCP proxy is pretty ugly. A complete solution to wireless routing would perhaps adaptively switch between latency-oriented and bandwidth-oriented routing techniques, depending on the nature of the traffic.

It seems arbitrary to me that ExOr works on a batch-by-batch basis—that almost seems like establishing a new TCP connection for each window’s worth of data. The amount of useful work done in each transmission round decreases, until the protocol imposes an arbitrary cutoff at 90% and switches to a traditional routing protocol. Instead, wouldn’t it be more sensible for ExOr to allow new packets to be inserted into the batch as the destination confirms the delivery of packets? This would essential emulate the “conservation of packets” principle from the Van Jacobson paper.


Filed under Paper Summaries

“Interactive Wi-Fi Connectivity for Moving Vehicles”

Interactive Wi-Fi Connectivity for Moving Vehicles” proposes ViFi, a system for achieving connectivity via WiFi from moving vehicles.

A moving vehicle is in range of different WiFi base stations over time. Due to host mobility, the best base station to use will change, so the client must do “handoff” to move between the base stations as appropriate. The authors describe traditional handoff policies as “hard handoff,” because a client is only ever connected to one base station at a time. The authors present experimental and analytical results to suggest that any handoff policy based on connecting to a single base station will be outperformed by policy which opportunistically utilizes multiple base stations that are in-range at once. Utilizing multiple base stations at once is effective for two reasons:

  1. A client is often in range of multiple base stations, at least in the urban environments they study.
  2. Packet loss is roughly independent of senders and receivers. In the upstream direction, utilizing multiple BSs is effective because losses are roughly independent of base stations: a packet that is not received by one BS is likely to be received by at least one another BS. In the downstream direction, BS diversity is effective because downstream losses tend to be “bursty” and path-dependent, but receiver independent: while one BS will often fail to deliver a burst of packets, other BSs are likely to be able to send packets to the receiver.


Therefore, the authors propose a scheme to allow WiFi clients to opportunistically utilize multiple in-range base stations. They assume that base stations have some ability to communicate, but do not have a high-bandwidth link. Hence, they focus on a scheme that allows the base stations to coordinate to enable base station diversity and avoids duplicate or missed packet deliveries, but requires minimal additional load on the inter-base station link and doesn’t hurt user-perceived latency. Their basic approach is to combine opportunistic reception (clients broadcast to all BSs in range) with probabilistic relaying (base stations compute relay probabilities independently and probabilistically).

In ViFi, each vehicle designates one nearby base station as the anchor, and the rest as auxiliaries. The anchor and auxiliary list is periodically broadcast by the vehicle via a beacon, which allows base stations to learn the identify and addresses of peer BSs.

Packet Transmission

The basic ViFi transmission algorithm is simple. To send a packet from the vehicle upstream, the receiver broadcasts to all in-range receivers. If the anchor receives the packet, it responds with an ACK broadcast. Any auxiliaries that overhear the packet broadcast wait for a short window; if they don’t see an ACK, they probabilistically relay the packet to the anchor via the inter-BS backplane. When the anchor receives relayed packets that it hasn’t already ACK’ed, it ACKs them. Finally, the receiver retransmits packets if it doesn’t receive an ACK within a certain period of time.

The transmission algorithm for downstream packets (base station => vehicle) works similarly (and almost symmetrically). The anchor broadcasts the packet to the vehicle. If the vehicle hears it, it broadcasts an ACK. If any auxiliaries overhear the broadcast but don’t see the ACK after a short window of time, they probabilistically broadcast the packet themselves. The anchor retransmits the packet if it doesn’t hear an ACK during the retransmission interval.

The magic is in the probabilistic retransmission technique, which must balance between retransmitting too often (wasting bandwidth) and not often enough (risking packet drops). The intuition behind their probabilistic retransmission technique is that they want the expected number of packet relays amongst all auxiliaries to be 1; amongst auxiliaries, they prefer transmission by auxiliaries that are better connected to the destination (i.e. either the anchor BS or the vehicle). Each BS computes packet reception probabilities between itself and the other BSs and the vehicle, using a periodic beacon.

Other Details

802.11 link-layer acknowledgments are sent immediately after a packet is delivered to the recipient, which allows the sender to predict the expected arrival time of the ACK. In ViFi, ACKs may be delayed due to relaying, which depends on current network conditions (e.g. behavior of the inter-BS backplane). Hence, ViFi uses its own acknowledgment scheme, with the retransmission interval set based on recent history (99th percentile measured delay for ACKs).

ViFi also implements salvage. When a vehicle moves out of range before the anchor BS can deliver packets for it, those packets are “stranded” on the old anchor BS. When the vehicle obtains a new anchor BS, the beacon it sends out also includes the previous anchor BS. The new anchor contacts the old anchor, and asks for any unacknowledged packets for the vehicle to be relayed.

Leave a comment

Filed under Paper Summaries

“Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications”

This paper describes the design of Chord, one of the earlier and most popular academic DHT systems (according to one measure, the Chord paper is the most-cited CS paper from 2001, along with a raft of other DHT papers). Chord’s design focuses on simplicity and offering provable performance and correctness properties. Chord provides a single service: given a key, it returns the address of the node storing the key. Other practical issues, like replication, caching, naming, and cryptographic authentication should be provided by an application level that runs on top of Chord.

Basic Protocol

In Chord, nodes and keys are arranged in a circle (the “identifier circle“). Each node maintains the address of the successor of that node; maintaining the correctness of these successor pointers is the critical correctness property of the protocol. Each key k is owned by the first node whose identifier (e.g. hash of IP address) follows k in the identifier circle. Therefore, keys can be located by simple sequential search; the search terminates when it reaches the node that ought to own the key.

To improve search performance, Chord also maintains a finger table. The finger table is essentially a skip list: if there are N nodes in the DHT, each finger table has O(log N) entries, holding the address of a node 1/2, 1/4, 1/8, … of the way “forward” from the current node in the identifier circle. This essentially allows the search procedure to perform a binary search of the identifier circle, to quickly approach the “neighorhood” of the target key. Note that this means that each node maintains more information about its local neighborhood than about distant nodes, which makes sense.

Joining A Chord Group

To join a Chord group, a joining node begins by computing its predecessor and finger tables. It then joins the group by notifying its successor node, which includes the new node as its predecessor. Each node runs a “stabilization” protocol to maintain the correctness of the successor pointers: each node n periodically asks whether the predecessor of its successor node is n; if not, a new node has joined the group, so n sets its successor to the new intermediate node, and notifies that node to make n the node’s predecessor. Once the successor pointers are updated, the new node has joined the Chord group, and can be accessed by subsequent lookup queries. In the background, keys that are now owned by n will be transferred from n‘s successor, and finger tables will be updated lazily: updating fingers promptly is not actually important for good performance, because fingers are used only to quickly reach the right “neighborhood.”

Handling Node Failure

If a node fails, the data on the node may be lost. This is not Chord’s concern, because it delegates responsibility for replication to the application. Instead, Chord must ensure that the Chord ring is not “broken”: lookup queries can continue, despite the fact that the predecessor of the failed node has an invalid successor node.

Chord achieves that by having each node maintain a list of successors, rather than a single one. If a node notices that its successor has failed, it simply tries the next node in the list. The paper argues that since the failure probabilities of elements of the successor list can be considered to be independent, using a sufficiently large successor list provides arbitrary protection against node failures. The stabilization protocol above is extended to apply to the entire list of successors, not just the first one.


Overall, this is a great paper, and it deserves its reputation as such. It also shows the benefit of combining good systems work with a top-flight theoretician (Karger).

Despite the paper’s formal tone, I was confused by their talk of “correctness”: they never actually define what their correctness criteria are. Their notion of “correctness” is apparently that the Chord ring remains live and can continue answering queries, even if the query answers are incorrect or undetected ring partitions occur. For example, they observe that when a join occurs and before stabilization finishes,

the nodes in the affected region [may] have incorrect successor pointers, or keys may not yet have migrated to newly joined nodes, and the lookup may fail. The higher-layer software using Chord will notice that the desired data was not found, and has the option of retrying the lookup after a pause.

That is quite disconcerting, and would be an impediment to using Chord-like systems as a backend for a reliable storage service. Importantly, many applications would like to distinguish between “lookup failed because of stabilization problems” and “lookup failed because the key was definitely not in the Chord ring.” As presented, the protocol does not appear to allow this distinction.

Another troubling aspect of the paper is the author’s attitude toward network partitions, which they regard as a “pathological case.” I thought this was surprising, because (short-lived) network partitions are not that rare. In fact, one of the explicit benefits of loose-consistency systems like DHTs is their ability to tolerate network partitions gracefully. They sketch a few ideas on how to detect and heal partitions, but don’t address this problem in any real depth.

The paper doesn’t talk about an obvious optimization: making “distance” in the Chord ring correlate with distance within the underlying physical network, to reduce communication costs. This is explored in depth by subsequent work.

1 Comment

Filed under Paper Summaries

“A Path Metric for Multi-Hop Wireless Routing”

A High-Throughput Path Metric for Multi-Hop Wireless Routing” presents the expected transmission count (ETX) metric for finding high-throughput paths on multi-hop wireless networks. By considering link loss rates, asymmetrical loss ratios between the directions of a link, and interference, the paper argues that ETX substantially outperforms routing protocols based on finding the shortest path (fewest hops) in an ad-hoc wireless network.


There are several shortcomings with using minimum hop count for routing in ad-hoc wireless networks:

  • Shortest-path routing protocols essentially assume that either a link works (probe packets are delivered successfully), or it doesn’t. While wireless networks can be roughly classified in this manner, wireless links have a much richer space of performance and loss characteristics than “it works” or “it doesn’t”.
  • Minimizing the hop count typically means maximizing the physical distance traveled by each hop, which leads to selecting links with weak signals and high loss rates. As the related RoofNet paper discusses, it is often preferable to utilize a path of short, high-quality links over a single long-distance, low-quality link.
  • In a dense ad-hoc network, there may be many different routes with the same number of hops. Deciding between routes on the basis of hop count is therefore not sufficient in any case.

There are problems with two naive alternatives to min-hop-count:

  • An improved routing metric could simply choose the path by maximizing the product of the per-link delivery ratios. This is inadequate, however, because it would result in preferring a two-hop route with perfect delivery to a one-hop route with a low loss rate — the latter route is actually better in many cases.
  • Choosing links on the basis of end-to-end delay is problematic, because of oscillation: once a route is utilized for traffic, its delay will often increase. This will result in the routing metric choosing another path, whose latency will then increase, causing it to switch back to the first path, and so on.

ETX Metric Design

The ETX metric for a link is the expected number of data transmissions required to send a packet of a certain size over the link. The ETX metric for a path is the sum of the per-link ETX metrics. ETX calculates per-link delivery ratios for both directions of a link (transmit and receive), to take into account the asymmetrical nature of wireless communication. Delivery ratios are calculated by periodic probes of a fixed size (sent once per second by default via broadcast); each node calculates per-link delivery ratios based on the most recent 10 seconds of activity.

ETX assumes that radios have a fixed transmit speed, and that congestion is not a factor. Neither assumption is very realistic: variable speed radios are now pervasive, and congestion might well be a factor in ad-hoc networks with sizable user bases. ETX also assumes that delivery rates are the same regardless of packet size, which is probably inaccurate (e.g. ACKs are small and take less time to transfer).


I was surprised that the paper spent so much time comparing ETX with min-hop-count metrics: it seems quite straightforward that min-hop-count would fair pretty poorly in an ad-hoc wireless network. In that sense, performance comparisons between ETX and min-hop-count are also not that informative: of course ETX will beat min-hop-count, because the latter is simply a terrible choice for this environment. It seems surprising that min-hop-count is truly the best example of prior work they could measure against — why not compare with the naive “maximize product of per-link delivery ratio” scheme, for example?

I didn’t get that much out of this paper: the RoofNet paper already summarizes an improved version of the ETX metric. Because the ETX evaluation compares it with such a pessimal alternative (min-hop-count), I didn’t get much out of the evaluation in this paper.


Filed under Paper Summaries

“Architecture and Evaluation of an Unplanned Mesh Network”

This paper discusses the design of the Roofnet, a 37-node “mesh network” that covers four square kilometers in Cambridge, MA. The authors are trying to get “the best of both worlds”: the performance of a carefully-planned wireless deployment, with the lack of planning and easy deployability of a collection of independent “hot spots.”


A small fraction of Roofnet nodes are connected to the Internet; the remainder of the Roofnet nodes access the Internet by finding a route through the Roofnet that terminates at a gateway node (4 of the 37 Roofnet nodes are gateways). Each Roofnet is assigned a non-routable IP address, based on the node’s MAC. Gateway nodes use NAT to route traffic into and out of Roofnet. Similarly, each Roofnet node runs a local “hot spot,” and assigns local (192.168.1.x) addresses to clients via DHCP. NAT is used to rewrite these addresses into the Roofnet node’s address, before they are routed through Roofnet. Using two levels of NAT means that no manual configuration of the address space is required: both Roofnet ndoe IPs and node-local client IPs can be automatically assigned, without risk of conflict.

Each node keeps track of the performance of all the gateway nodes it is connected to, and chooses the best route for each new TCP session. Roofnet uses dynamic source routing (DSR): each node maintains a database of link metrics for each pair of nodes in the network. It chooses the best route for each new connection by applying Dijkstra’s algorithm, and “source routes” (the entire route for a flow is assigned by the originating end point, not intermediate routers). The link metrics are periodically updated: link metrics for the source route are included by each node that forwards a packet; periodic flooding of link metrics is done; and nodes “snoop” and overhear link metrics sent by other traffic they can listen to. Flooding is only necessary when no route to the destination host can be found—the authors argue that this will rarely occur, since most routes terminate in a gateway.

To choose among routes, Roofnet uses an “estimated transmission time” (ETT) metric: it combines the link’s highest-throughput bit rate with the probability that a packet sent over the link will be successfully delivered. The authors found that this policy often results in using relatively high-bandwidth routes with relatively high link-level loss rates. This situation was unanticipated by the firmware designers of their wireless NIC, so they had to use a custom policy they call “SampleRate” for choosing which 802.11b transfer bit rate to use (high-bandwidth, high-loss rates are often preferable). SampleRate varies the 802.11b transfer rate dynamically, measuring the current throughput vs. delivery probability for the different rate options. SampleRate is similar in principle to the ETT routing metric they use, but differs because:

  1. SampleRate selects only the link-local transfer rate, not the route.
  2. SampleRate is based on measurements of actual packet transmissions rather than broadcasting or snooping, and hence can respond to changes in network conditions much more rapidly than ETT can.


The authors found that the average throughput between a node and its closest gateway is 1395 kbit/sec, with a latency of 22 msec: this is quite impressive, and matches typical budget consumer-grade DSL performance at the time (2005). Roofnet routes most traffic over short, high-bandwidth hops, ignoring most of the long-distance links available that perform poorly.

The authors found that a density of about 5 nodes per square kilometer is necessary (in this environment) for all-pairs connectivity. Increasing node density leads to higher throughput, because there are more alternate routes to choose from.

Roofnote is a true “mesh” network, in the sense that each node routes at least some packets through most of its neighbors (rather than preferring a single neighbor for most traffic, which would suggest that a directional, planned design would be better). On the other hand, eliminating the two best-connected nodes decreases average throughput by 43%, which suggests that, despite the mesh architecture, good network performance depends on having a few “supernodes”.


The authors are careful to brag more prominently about the mean all-pairs average throughput (627 kbits/sec) than about the median (400 kbits/sec). This is a little disingenuous, given the presence of a small number of high-bandwidth links that skew the distribution.

Using NAT is understandable, but doesn’t work well with some networking applications (e.g. BitTorrent, running a server inside Roofnet). It would be interesting to extend the Roofnet design to allow incoming connections from the Internet to an arbitrary Roofnet node, as well as allowing a TCP connection to remain connected as its route is changed on-the-fly.

It would be interesting to compare the connectivity of mesh networks with peer-to-peer networks: many of the concepts (auto-configuration, peer-to-peer topology, the importance of a small number of well-connected “supernodes”) seem to apply to both kinds of networks.

1 Comment

Filed under Paper Summaries