“Skilled in the Art of Being Idle”

Skilled in the Art of Being Idle” looks at how to reduce energy consumption by network end hosts (primarily desktops and laptops). Modern computers have various “sleep” states that allow reduced power consumption during idle periods. However, putting a computer to sleep has several costs:

  1. Transitioning into and out of a sleep state requires time (the paper cites a recent paper that found that typical machines take 3-8 seconds to enter “S3” sleep state, and 3-5 seconds to resume).
  2. Sleeping hosts cannot respond to network packets; hence, they can lose their network presence (e.g. DHCP lease can expire and be reassigned to another host). They also cannot run periodic tasks (e.g. backup or virus scanning).

A naive approach would put idle nodes to sleep, and then awaken them (via established “wake-on-LAN” support) when a packet is delivered to the node. This is insufficient: the authors demonstrate that in both home and office environments, the inter-arrival time of packets destined for idle computers is too small, so the cost of transitioning into and out of sleep state would negate any significant power savings. To avoid this problem, prior work has proposed using a proxy to handle network traffic intended for a sleeping node. The proxy can either ignore the traffic (if appropriate); handle the packet itself (e.g. by responding to an ARP query for the sleeping node’s IP), or it can awaken the sleeping node and forward the packet to it, if necessary. The effectiveness of proxying therefore depends on most traffic for an idle node fitting into the first two categories.

The paper is an empirical study of 250 machines owned by Intel employees, to assess the need for proxying in practice; the potential benefit of proxying; the network traffic that must be handled by a proxy; and so on.

Summary of Findings

  • Most machines are idle, most of the time — on average, machines were idle 90% of the time.
  • Home and office idle network traffic is markedly different, in both inter-arrival time of packets for idle machines (offices have more such traffic), and the nature of this traffic.
  • A power-saving proxy would need to handle broadcast, multicast, and unicast traffic to achieve significant gains. However, broadcast and multicast traffic represents “low-hanging fruit”: a proxy that handles only broadcast and multicast traffic would recover 80% of the idle time in home environments, and over 50% of the idle time in office environments.
  • For broadcast, address resolution (ARP, NBNS) and service discovery (SSDP for UPnP devices) protocols are the dominant sources of traffic for idle nodes. Both kinds of traffic are easy to proxy.
  • For multicast environments, routing traffic (HSRP, PIM) is the dominant source of traffic for idle nodes in an office environment. In a home environment, service discovery (SSDP) is dominant.
  • Their results for unicast traffic are less clear. They argue that only outgoing TCP connections dominate unicast traffic for idle nodes, and that less than 25% of this traffic is the result of some action a node initiated before becoming idle (and hence might need to be maintained in the idle state). They argue that outgoing traffic initiated while the machine is idle can often be batched together, or avoided entirely.
  • Advertisements

2 Comments

Filed under Paper Summaries

“Scalable Application Layer Multicast”

Multicast is clearly an efficient technique for applications with one-to-many communication patterns, but the deployment of in-network multicast has been slow. Therefore, there have been a number of proposals for implementing multicast at the application level, as an overlay over the physical network. This is not as efficient as true in-network multicast (because the same message may be sent over the same link multiple times), but is much more flexible and easier to deploy.

Scalable Application Layer Multicast” is one such proposal for application-layer multicast via an overlay network; their proposed protocol is called NICE. Their focus is on applications that require low-latency delivery of relatively low-bandwidth data streams to a large set of recipients, although they argue that their techniques are also applicable to high-volume streams with some minor changes.

Network Topology

In NICE, nodes are arranged into layers; a single node can appear in more than one layer. Every node belongs to the lowest layer, L0. The nodes in a layer are arranged (automatically) into clusters of nodes that are “near” to one another (in terms of network distance/latency). Each cluster has a leader, which is the node in the “center” of the cluster (NICE tries to make the leader the node that has the smallest maximal latency to the other nodes in the cluster). Layer L1 consists of all the cluster leaders from L0; the clustering algorithm is applied to L1 in turn, yielding another set of cluster leaders which form L2, and so forth. The height of the tree is determined by the number of nodes and a constant k: each cluster contains between k and 3k-1 nodes.

To multicast a data message, a node forwards the message to every cluster peer in every layer in which the node belongs, except that a node never forwards a message back to the message’s previous hop.

Protocol

To join the multicast group, a node begins by contacting a designated node called the Rendezvous Point (RP). The RP is typically the root of the NICE tree. The joining node walks down the tree from the root, choosing the child node that is closest to it (lowest latency).

Cluster leaders periodically check whether the cluster size constraint (k <= size <= 3k-1) has been violated; if so, they initiate a cluster merge or split, as appropriate. Splitting a cluster into two clusters is done by trying to minimize the maximum of the radii of the resulting clusters.

All the nodes in a cluster periodically sends heartbeats to each of its cluster peers. This is used to detect node failures, and to update pair-wise latency information for nodes. If a cluster leader fails or deliberately leaves the NICE group, a new leader is chosen by the same heuristic (minimize maximum latency from new center to any cluster peer). A new leader may also be chosen if the pair-wise latencies in the cluster drift sufficiently far to make selecting a new leader justified. Also, each member of every layer i periodically probes its latency to the nodes in layer i+1 — if the node is closer to another i+1 layer node, it moves to the corresponding cluster in layer i.

Discussion

I didn’t see that the authors provided any grounds for choosing an appropriate k value (essentially the tree fan-in), which seems like it would be an important parameter.

1 Comment

Filed under Paper Summaries

“ExOr: Opportunistic Multi-Hop Routing for Wireless Networks”

Wired networks are traditionally viewed as a unicast medium: packets are sent from A to B. Because the underlying medium can have a degree of sharing (e.g. multiple hosts connected to the same Ethernet hub), steps must be taken to avoid interference between concurrent senders (e.g. CSMA).

In a wireless network, the medium is shared: any hosts in range of a radio transmission can receive it. If the goal is to use a traditional unicast routing protocol over a wireless link, this makes interference between senders a more challenging problem (e.g. as discussed in the MACAW paper). However, rather than viewing broadcast as an inconvenience we need to work around, the broadcast nature of wireless can also be leveraged to design new protocols that would be infeasible for a wired network.

A nice example of a broadcast-oriented protocol for wireless is “ExOr: Opportunistic Multi-Hop Routing for Wireless Networks“. In a traditional routing design, a sender chooses the next hop that should receive a packet, and then unicasts the packet to that destination. In ExOr, a sender broadcasts a batch of packets to a group of nodes simultaneously. The set of batch recipients coordinate to try to ensure that each packet in the batch is forwarded onward. The recipient of a packet is only chosen after the packet is sent, which allows ExOr to “opportunistically” take advantage of links that have high loss rates. ExOr assumes that node reception probabilities are mostly independent and mostly decrease with distance — both of which are probably pretty reasonable assumptions.

Design

The source collects a batch of packets destined for the same host, and then chooses a forwarding list for the batch. The forwarding list is a list of nodes, sorted by the expected cost of delivering packets from that node to the eventual destination node. The cost metric is similar to ETX (expected number of transmissions required to send the packet to the destination via unicast, including retransmissions); unlike ETX, it only considers the forward delivery probability. While it would be possible to including all possible recipient nodes in the forwarding list, this would increase the coordination cost among the forwarders, so ExOr only includes “likely” recipients in the list (estimated 10%+ chance of receiving a broadcast packet).

Each packet contains a batch map, which holds the sender’s estimate of the highest priority (according to the cost metric) node to have received each packet in the batch. When a node receives a packet, it uses the packet’s batch map to update its local batch map. This means that batch map information propagates through the nodes, carrying information about packet reception from high priority nodes to lower priority nodes.

After the source broadcasts a batch, each member of the forwarding list broadcasts, ordered by descending priority (ETX value). Each node broadcasts the packets it received, along with its updated batch map. Nodes coordinate to schedule their transmissions to try to avoid interference (e.g. by estimating when each node’s expected transmission time is, according to ETX value and batch map contents). The protocol continues cycling through the nodes in priority order. At the end of each cycle, the ultimate destination broadcasts its batch map 10 times; at the beginning of each cycle, the source resends packets that weren’t received by any node (by observing batch map contents). The ExOr scheme stops when 90% of the packets in a batch have been transferred, and uses a traditional routing policy to deliver the remainder of the batch.

Discussion

Overall, ExOr is a really neat idea. That said, it is at best a special-purpose solution, because of the high latency it incurs for batching and multiple transmission rounds. Similarly, the need for a split TCP proxy is pretty ugly. A complete solution to wireless routing would perhaps adaptively switch between latency-oriented and bandwidth-oriented routing techniques, depending on the nature of the traffic.

It seems arbitrary to me that ExOr works on a batch-by-batch basis—that almost seems like establishing a new TCP connection for each window’s worth of data. The amount of useful work done in each transmission round decreases, until the protocol imposes an arbitrary cutoff at 90% and switches to a traditional routing protocol. Instead, wouldn’t it be more sensible for ExOr to allow new packets to be inserted into the batch as the destination confirms the delivery of packets? This would essential emulate the “conservation of packets” principle from the Van Jacobson paper.

2 Comments

Filed under Paper Summaries

“White Space Networking with Wi-Fi like Connectivity”

White Space Networking with Wi-Fi like Connectivity” discusses how to achieve wireless networking using so-called “white space”: unused portions of the UHF spectrum (approximately 512-698 Mhz). This is attractive, because using the UHF spectrum can allow wireless networking over a relatively long range (1 mile or greater, compared to 300-600 feet for 802.11n). UHF signals are also better able to penetrate walls and buildings in urban environments.

The difficulty with using UHF spectrum is that the spectrum has already been assigned for use by two “incumbents”: analog TV and wireless microphones (although the use of the spectrum by analog TV should be reduced since this paper was written, due to the recent conversion of analog TV to digital). Per FCC rules, non-incumbent use of the UHF spectrum must avoid interfering with incumbent use. While TV signals are relatively stable, wireless microphone transmissions can begin without warning, which poses some challenges to providing wireless networking using the UHF spectrum. To gain improved throughput, the authors also suggest that multiple contiguous free channels should be aggregated together and used for networking.

To enable networking over UHF, several problems must be solved:

  • Spectrum Assignment: What portion of the 180 Mhz UHF band should be used for networking in a given locale? The UHF spectrum is typically “fragmented”: some portions of the spectrum are in use, while others are free. Furthermore, to increase throughput, multiple contiguous free channels should be used by a single network. The right spectrum to use will also change over time, e.g. as hosts move and wireless microphones are activated. The channel used must be free for both the AP and all clients, which is made challenging by the larger spacial extent of UHF networks.
  • AP Discovery: WiFi APs emit beacons every 100ms on the WiFi channel they are using. To find APs, clients can easily scan each channel, looking for AP beacons. This is more challenging for UHF, because there are more possibilities to scan (30 UHF channels, and 3 possible channel widths, yielding 84 combinations).
  • Handling Disconnections: If incumbent use of a channel is detected, networking transmissions must immediately stop using the channel (the authors found that even transmitting control packets can result in audible interference for wireless microphones using the channel). If clients and APs are prone to abruptly disconnect from a channel, a method is needed to choose another channel and continue communication.

Design

To solve the spectrum assignment problem, clients and APs exchange information about which channels are free near them. APs probe for new channels when they detect incumbent use on their current channel, or if they observe a performance drop. Channel probing is done by considering which channels are free for all clients and the AP, and by then estimating the available bandwidth on each channel.

To enable AP discovery, the authors propose some signal processing magic. They propose a technique called “SIFT” which essentially allows them to scan the spectrum range and cheaply determine the presence of an AP on a channel; they can then tune the radio transceiver to the identified channel, and decode the beacon packet as usual.

To handle disconnections due to incumbent use of the spectrum, they propose using a separate 5Mhz backup channel, which is advertised as part of AP beacon packets. If a client senses that a disconnection has occurred (e.g. because no data packets have been received recently), it switches to the backup channel, and both listens for and emits chirps. Chirps contain information about the available white spaces near the chirping node, which is used to pick a new channel for communication. APs periodically scan for chirps.

It’s possible that the backup channel is already in use by another incumbent. In this case, the client picks an arbitrary available channel as the secondary backup channel, and emits chirps on it. The AP periodically scans all channels to attempt to find such chirps. The same signal processing magic (SIFT) that is used to enable cheap AP discovery is used to make scanning all channels periodically feasible.

Related Reading

Matt Welsh discusses this paper on his blog. The slides for the SIGCOMM talk on this paper are also available online.

Leave a comment

Filed under Paper Summaries

“Interactive Wi-Fi Connectivity for Moving Vehicles”

Interactive Wi-Fi Connectivity for Moving Vehicles” proposes ViFi, a system for achieving connectivity via WiFi from moving vehicles.

A moving vehicle is in range of different WiFi base stations over time. Due to host mobility, the best base station to use will change, so the client must do “handoff” to move between the base stations as appropriate. The authors describe traditional handoff policies as “hard handoff,” because a client is only ever connected to one base station at a time. The authors present experimental and analytical results to suggest that any handoff policy based on connecting to a single base station will be outperformed by policy which opportunistically utilizes multiple base stations that are in-range at once. Utilizing multiple base stations at once is effective for two reasons:

  1. A client is often in range of multiple base stations, at least in the urban environments they study.
  2. Packet loss is roughly independent of senders and receivers. In the upstream direction, utilizing multiple BSs is effective because losses are roughly independent of base stations: a packet that is not received by one BS is likely to be received by at least one another BS. In the downstream direction, BS diversity is effective because downstream losses tend to be “bursty” and path-dependent, but receiver independent: while one BS will often fail to deliver a burst of packets, other BSs are likely to be able to send packets to the receiver.

Design

Therefore, the authors propose a scheme to allow WiFi clients to opportunistically utilize multiple in-range base stations. They assume that base stations have some ability to communicate, but do not have a high-bandwidth link. Hence, they focus on a scheme that allows the base stations to coordinate to enable base station diversity and avoids duplicate or missed packet deliveries, but requires minimal additional load on the inter-base station link and doesn’t hurt user-perceived latency. Their basic approach is to combine opportunistic reception (clients broadcast to all BSs in range) with probabilistic relaying (base stations compute relay probabilities independently and probabilistically).

In ViFi, each vehicle designates one nearby base station as the anchor, and the rest as auxiliaries. The anchor and auxiliary list is periodically broadcast by the vehicle via a beacon, which allows base stations to learn the identify and addresses of peer BSs.

Packet Transmission

The basic ViFi transmission algorithm is simple. To send a packet from the vehicle upstream, the receiver broadcasts to all in-range receivers. If the anchor receives the packet, it responds with an ACK broadcast. Any auxiliaries that overhear the packet broadcast wait for a short window; if they don’t see an ACK, they probabilistically relay the packet to the anchor via the inter-BS backplane. When the anchor receives relayed packets that it hasn’t already ACK’ed, it ACKs them. Finally, the receiver retransmits packets if it doesn’t receive an ACK within a certain period of time.

The transmission algorithm for downstream packets (base station => vehicle) works similarly (and almost symmetrically). The anchor broadcasts the packet to the vehicle. If the vehicle hears it, it broadcasts an ACK. If any auxiliaries overhear the broadcast but don’t see the ACK after a short window of time, they probabilistically broadcast the packet themselves. The anchor retransmits the packet if it doesn’t hear an ACK during the retransmission interval.

The magic is in the probabilistic retransmission technique, which must balance between retransmitting too often (wasting bandwidth) and not often enough (risking packet drops). The intuition behind their probabilistic retransmission technique is that they want the expected number of packet relays amongst all auxiliaries to be 1; amongst auxiliaries, they prefer transmission by auxiliaries that are better connected to the destination (i.e. either the anchor BS or the vehicle). Each BS computes packet reception probabilities between itself and the other BSs and the vehicle, using a periodic beacon.

Other Details

802.11 link-layer acknowledgments are sent immediately after a packet is delivered to the recipient, which allows the sender to predict the expected arrival time of the ACK. In ViFi, ACKs may be delayed due to relaying, which depends on current network conditions (e.g. behavior of the inter-BS backplane). Hence, ViFi uses its own acknowledgment scheme, with the retransmission interval set based on recent history (99th percentile measured delay for ACKs).

ViFi also implements salvage. When a vehicle moves out of range before the anchor BS can deliver packets for it, those packets are “stranded” on the old anchor BS. When the vehicle obtains a new anchor BS, the beacon it sends out also includes the previous anchor BS. The new anchor contacts the old anchor, and asks for any unacknowledged packets for the vehicle to be relayed.

Leave a comment

Filed under Paper Summaries

X-Trace

X-Trace: A Pervasive Network Tracing Framework” provides a tool for understanding the behavior of distributed systems composed of layers of protocols. Traditional logging and diagnostic tools operate at a single layer in the stack, for example by tracing the flow of HTTP or TCP traffic in a network. This is insufficient for understanding many realistic failure scenarios, because application traffic typically traverses many different layers and protocols: when a client initiates an HTTP request, the following might happen:

  1. A DNS lookup is performed (via UDP, perhaps requiring recursive resolution)
  2. A TCP connection is established to the remote server, which requires transmitting IP packets and numerous Ethernet frames across different network links.
  3. The remote server handles the HTTP request, typically by running various application code and contacting multiple databases (e.g. a single Google search request is distributed to ~1000 machines)
  4. The HTTP response is returned to the client; the contents of the response may prompt the client to issue subsequent requests (e.g. additional HTTP requests to fetch resources like images and external CSS)

A failure at any point in this sequence can result in a failure of the original action—hence, diagnosis tools that attempt to operate at a single layer of the stack won’t provide a complete picture into the operation of a distributed system.

Design

X-Trace works by tagging all the operations associated with a single high-level task with the same task ID. By modifying multiple layers of the protocol stack to record and propagate task IDs, all the low-level operations associated with a high-level task can be reconstructed. Furthermore, X-Trace also allows developers to annotate causal relationships, which allows a “task tree” of related operations to be constructed.

X-Trace metadata must be manually embedded into protocols by developers; protocols typically provide “extension”, “option”, or annotation fields that can be used to hold X-Trace data. The “trace request” (tagging all the operations associated with a task) is done in-band, as part of the messages sent for the task. Data collection happens offline and out-of-band. This makes failure handling easier, and allows the resources required for data collection to be reduced (e.g. using batching and compression). The downside to offline data collection is that it makes prompt diagnosis of problems more difficult.

Discussion

Overall, I think this is a really nice paper. The idea is obviously useful, and it is nicely explained.

The system appears to only have a limited ability to track causal relationships. In particular, situations involving multiple clients modifying shared state don’t appear to be supported very well. For example, suppose that request A results in inserting a row into a database table. Request B aggregates over that table; based on the output, it then tries to perform some action, which fails. Clearly, requests A and B are causally related in some sense, but X-Trace wouldn’t capture this relationship. Extending X-Trace to support full causal tracking would be equivalent to data provenance.

It would be interesting to try to build a network-wide assertion checking utility on top of the X-Trace framework.

2 Comments

Filed under Paper Summaries

NetFPGA

One of the challenges faced in networking research and education is the difficulty of accurately recreating a network environment. Production routers are implemented as hardware devices, but designing and fabricating a new hardware design for research or teaching is very expensive. Modular software routers like Click enable research and education on routing in software, but don’t allow experimentation with hardware and physical-layer issues.

NetFPGA aims to rectify this, by providing an easy-programmable hardware platform for network devices (NICs, switches, routers, etc.). Users interact with NetFPGA remotely, uploading programs that configure FPGAs on the device appropriately, and then observing the results. The initial version of NetFPGA didn’t have a CPU: instead, software to control the device runs on another computer, and accesses hardware registers remotely (using special Ethernet packets that are decoded by the NetFPGA board). The second version of NetFPGA has an optional hardware CPU.

Discussion

This seems like a pretty incontrovertible “good idea.” One potential question is how the performance of an FPGA-based router compares to that of realistic production hardware — I’d suspect that NetFPGA-routers occupy a middle ground between software-only routers (easy to develop, poor performance) and production routers (difficult to program and expensive to fabricate, but good performance).

Leave a comment

Filed under Paper Summaries