Category Archives: Paper Summaries

“Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks”

This paper focuses on distinguishing human-generated activity from bot-generated activity. Then, human-generated activity can be given preferential treatment (e.g. favorable routing of traffic, not being treated as spam). Their measure for distinguishing human-generated actions from machine-generated actions is pretty coarse and imprecise: an action is human-generated if it is preceded by keyboard or mouse input within a certain amount of time.

To implement this scheme, they go into considerable (exhaustive) detail about how to use the Trusted Computing Module (TPM) to build a trusted path between the physical input devices (keyboard, mouse) and a small piece of software called the attestor. To certify an action as human-generated, applications ask the attester for an attestation, passing a hash of the content to the attested for. If there has been user input within a predefined period, the attester returns a cryptographically-signed token that can be attached to the user action. When an upstream service receives the user action (e.g. HTTP request, email), it can verify the attestation by hashing the content of the action, and checking the cryptographic signature. Incorporating the content hash prevents an attestation for action x being used instead with action y. The verifier also needs to check that attestations are not reused, so the attester includes a nonce in the attestation token.

It is possible that a bot can monitor user actions, and submit malicious content to the attester whenever the user uses an input device. This would allow attestations to be created for malicious content, which means upstream software cannot blindly trust attested-for content. To reduce the impact of this attack, the paper suggests rate-limiting attestations to one per second.

Discussion

I liked how the paper discussed an alternative approach to the same problem (having the attester track keyboard and mouse inputs, and then match that recorded history against the content that is to be attested, looking for a correspondence). Many papers present the solution they chose as the only alternative, when in fact it usually represents only one point in a much richer design space.

In some sense, the inverse of the proposed functionality would be more useful: i.e. being able to assert “this content was definitely bot-generated.” As proposed, it might be very hard for upstream services to make use of the certifications unless this idea saw widespread adoption. For example, suppose that 0.1% of your traffic is guaranteed by NAB to be human-generated. The remaining traffic may or may not be bot-generated, so you effectively can’t discriminate against it.

The paper suggests that, using this approach, human-generated content on a bot-infested machine can be effectively distinguished from bot-generated traffic. This seems pretty unlikely: the bot software can simply suppress all outgoing attestations (e.g. by installing a rootkit and interfering with the API used to request attestations), leaving upstream software in the same state as they would be without NAB.

Leave a comment

Filed under Paper Summaries

“BotGraph: Large Scale Spamming Botnet Detection”

Botnets are used for various nefarious ends; one popular use is sending spam email by creating and then using accounts on free webmail providers like Hotmail and Google Mail. In the past, CAPTCHAs have been used to try to prevent this, but they are increasingly ineffective. Hence, the BotGraph paper proposes an algorithm for detecting bot-created accounts by analyzing user access behavior. They describe the algorithm, its implementation with Dryad, and present experimental results from real-world Hotmail access logs.

Algorithm

BotGraph employs three different ideas for detecting automated users:

  1. They regard sudden spikes in the number of accounts created by a single IP as suspicious. Hence, they use a simple exponentially-weighted moving average (EWMA) to detect such spikes, and throttle/rate-limit account signups from suspicious IPs. This has the effect of making it more difficult for spammers to obtain webmail accounts.
  2. They argue that the number of bot machines will be much smaller than the number of bot-created webmail accounts; hence, one bot machine will access a large number of accounts. They also argue that a single bot-created webmail account will be accessed from multiple bots on different autonomous systems (ASs), due to churn in the botnet (although this seems pretty unconvincing to me), and the fact that rate-limiting makes it more difficult to create large numbers of bot accounts. Hence, they look for pairs of user accounts that had logins from an overlapping set of ASs.
  3. Finally, they consider a user’s email-sending behavior:

    Normal users usually send a small number of emails per day on average, with different email sizes. On the other hand, bot-users usually send many emails per day, with identical or similar email sizes

    Hence, they regard users who send 3+ emails per day as “suspicious”; they also regard as suspicious users whose email-size distributions are dissimilar from most other users.

They use feature #1 primarily to rate-throttle new account creations. Feature #3 is used to avoid false positives.

Feature #2 is the primary focus of the paper. They construct a user-user graph with a vertex for each user account. Each edge has a weight that gives the number of shared login ASs — that is, the number of ASs that were used to login to both accounts. Within the user-user graph, they look for connected components with an edge weight over a threshold T: they begin by finding components with T=2, and then iteratively increasingly the threshold until each component has no more than 100 members.

Implementation

They describe two ways to implement the construction of the user-user graph using a data-parallel system like MapReduce or Dryad, using the login log from Hotmail (~220GB for one month of data):

  1. Partition the login records by client IP. Emit an intermediate record (i, j, k) for each shared login on the same day from AS k to accounts i and j. In the reduce phase, group on (i, j) and sum. The problem with this approach is that it requires a lot of communication: most edges in the user-user graph have weight 1, and hence can be dropped, but this approach still requires sending them over the network.
  2. Partition the login records by user name. For each partition, compute a “summary” of the IP-day keys present for users in that partition (the paper doesn’t specify the nature of the summary, but presumably it is analogous to a Bloom filter). Each partition sends its summary to every other partition. Using the summaries, each partition can exchange login records with other partitions in a way that allows edge weights to be computed, but doesn’t require sending weight 1 edges over the network.

They argue that the second method can’t be implemented with Map and Reduce, although I’m not sure if I believe them: multicasting can be done by writing to HDFS, as can shipping data between logical partitions.

Discussion

I think the major problem with their experimental results is that there’s effectively no adversary: botnet operators presumably weren’t aware of this technique when the experiments were performed. Hence, they haven’t adapted their tactics — which might actually be quite easy to do.

For example, it seems like it would be quite easy to defeat their EWMA-based throttling by simply increasing the number of signups/time gradually. Essentially, the bot machine acts like an HTTP proxy with a gradually-increasing user population. One can imagine such a bot even mimicking the traffic patterns exhibited by a real-world proxy (e.g. increase at 9AM, decrease at 5PM). Certainly using a simple EWMA seems too primitive to defeat a dedicated adversary.

Similarly, it also seems quite easy to avoid sharing a single webmail account among multiple botnets: simply assign a single webmail account to a single bot machine, and don’t reuse webmail accounts if the bot machine becomes inaccessible. The idea, again, is to simulate an HTTP proxy that accesses a large number of webmail accounts. The paper’s argument that “churn” requires reuse of webmail accounts “to maximize bot-account utilization” is unconvincing and unsubstantiated. Since this is the entire principle upon which their technique is based, I’d be quite concerned that a relatively simple adaptation on the part of botnet operators would make this analysis ineffective.

I thought the paper’s wide-eyed tone toward using MapReduce-style systems for graph algorithms was annoying. Lots of people do large-scale graph algorithms using MapReduce-style systems; in fact, that’s one of the main things MapReduce was originally designed for (e.g. computing PageRank). The paper is not novel in this respect, and I was surprised that they didn’t cite one of the many prior papers on this subject.

1 Comment

Filed under Paper Summaries

“Cutting the Electric Bill for Internet-Scale Systems”

This paper begins with three observations:

  1. Energy-related costs are an increasingly large portion of total data center operating expenses.
  2. The cost of electricity can vary significantly between different times and between different regions at the same time.
  3. Many distributed systems already have the ability to dynamically route requests to different hosts and physical regions (for example, most CDNs try to minimize client latency and bandwidth costs by routing requests to a data center “close” to the client.)

Therefore, the paper investigates the feasibility of shifting load among replicas of a service that are located in different geographic regions, according to the current price of electricity in each region. For this to be effective, several things must be true:

  1. There must be significant variation in the price of electricity available in different regions at the same time.
  2. Data centers must be energy proportional: as the load on a data center is decreased by a factor of k, its energy usage should decrease by the same factor.
  3. Routing traffic to minimize the cost of electricity may result in increasing client latency and using more bandwidth (since cheap power might be far away from the client); the additional routers traversed might also use additional energy.

To answer the first question, the authors conduct a detailed empirical study of the cost of energy in different regions across the US, and compare that information with traffic logs from Akamai’s CDN. The authors use the Akamai traffic data to estimate how much the cost of electricity could be reduced by routing requests to the cheapest available electricity source, subject to various additional constraints.

The authors don’t do much to address the second question: they admit that the effectiveness of this technique depends heavily on energy-proportionality, but most computing equipment is not very energy-proportional (idle power consumption of a single system is typically ~60% of peak power usage, for example). Since energy-proportionality is the subject of much recent research, they express the hope that future hardware will be more energy-proportional. Finally, they carefully consider the impact of electricity-price-based routing on other optimization goals: for example, they consider only changing routes in a way that doesn’t result in any increased bandwidth charges (due to the “95-5” pricing scheme that most bandwidth providers use). A realistic implementation of this technique would consider electricity cost as one factor in a multi-variable optimization problem: we want to simultaneously minimize electricity cost, minimize client-perceived latency and minimize bandwidth charges, for example.

Summary of Results

The authors found significant asymmetry in electricity prices between geographic areas; furthermore, this asymmetry was dynamic (different regions were cheaper at different times). These are promising results for dynamic routing of requests based on electricity prices.

When cluster energy usage is completely proportional to load and bandwidth cost is not considered, price-sensitive routing can reduce energy costs by ~40%. The savings drop to only 5% if the energy-proportionality of current hardware is used, and the savings drop to a third of that if we are constrained to not increase bandwidth costs at all (assuming 95-5 pricing). Hence, this technique is only really effective if energy-proportional data centers are widely deployed.

Discussion

I thought this was a great paper. The basic idea is simple, but their empirical study of the US electricity market was carefully done, and the results are instructive.

One interesting question is what would happen to the electricity market if techniques like these were widely deployed. Essentially, electricity consumption would become more price-elastic. When a given region offers a lower price, demand could move to that region quite quickly, which might act to drive up the price. Conversely, it would lower demand in higher-priced regions, lowering the price — and hence benefiting more inelastic energy consumers in that region.

3 Comments

Filed under Paper Summaries

“Scalable Reliable Multicast”

The SRM (Scalable Reliable Multicast) argues that, unlike reliable unicast protocols, different applications vary widely in their requirements for reliable multicast delivery. Hence:

One cannot make a single reliable multicast delivery scheme that simultaneously meets the functionality, scalability, and efficiency requirements of all applications.

Hence, they propose a model in which a generic skeleton is “fleshed out with application specific details.” The application provides a means to talk about how much data has been sent and received (“application data units”, unlike the connection-state oriented acknowledgments in TCP), a means for allocating bandwidth among the members of the group, and a means for individual nodes to decide how to allocate their local outgoing bandwidth. Following this model, their multicast framework only provides best-effort packet delivery, with possible duplication and reordering of packets — they believe that applications can build stronger reliable delivery and ordering properties on top of their framework, as needed. Best-effort multicast is actually implemented using IP multicast.

Unicast vs. Multicast

The authors point out two differences between reliable delivery for unicast vs. multicast that I thought were interesting:

  • In any reliable delivery protocol, one party must take responsibility for detecting lost data and retransmitting it. In unicast, either the sender or receiver can play this role equally well (TCP uses the sender, other protocols like NETBLT use the receiver). In multicast, sender-side delivery state is problematic: the sender must track the set of active recipients and the current state of each recipient, which is expensive, and difficult to do as the multicast group changes. In some sense, the whole point of multicast is to relieve the sender of that responsibility. Hence, they argue that receiver-side delivery state is better for multicast: group membership isn’t relevant, and the burden of keeping per-receiver state is avoided.
  • A reliable delivery protocol also needs a vocabulary to talk about how much data has been sent or received. Typical unicast protocols use a vocabulary based on communication state (typically either bytes or packets (segments)). This is not ideal for multicast, because a newly-joining recipient doesn’t share that communication state, and hence can’t interpret byte-oriented or packet-oriented messages. SRM instead argues for talking in terms of “application data units.”

wb Framework

The example multicast application in the paper is wb, a shared distributed whiteboard. This has the advantage that protocol commands (e.g. draw shape X at location Y) are mostly idempotent (if associated with a timestamp), so the underlying multicast protocol doesn’t need to enforce a total order on deliveries.

In wb, new operations are multicast to the entire group. When a receiver detects a loss (by detecting gaps in sequence numbers), it starts a “repair request” timer. The timer value is determined by the distance of the receiver from the original data source. When the timer expires, the recipient multicasts a “repair request” to the entire group, asking for the missing data. If a recipient sees another repair request for the same data before its timer expires, it suppresses its own repair request. Retransmission of missing data is handled similarly: when nodes receive repair requests for data they have seen, they start a response timer based on their distance from the repair request source. When the timer expires, they multicast the requested data to the entire group. Any other nodes that have a response timer for the requested data suppress their own timers. They also propose a bandwidth allocation scheme to divide the available bandwidth among new data operations and repair data.

Tuning the Request/Response Repair Timers

It is important that the repair request and repair response timers at different nodes be de-synchronized, to avoid redundant messages. The paper observes that certain topologies require methods for achieving de-synchronization: in a simple “chain” topology, seeding the timer with network distance is sufficient. For a “star” topology, all nodes are the same distance from the data source, so randomization must be used. A combination of these techniques must be used for a tree topology. Rather than requiring the timers be tuned for each individual network, they instead propose an adaptive algorithm that uses prior request/response behavior to tune the timers automatically.

Leave a comment

Filed under Paper Summaries

“Skilled in the Art of Being Idle”

Skilled in the Art of Being Idle” looks at how to reduce energy consumption by network end hosts (primarily desktops and laptops). Modern computers have various “sleep” states that allow reduced power consumption during idle periods. However, putting a computer to sleep has several costs:

  1. Transitioning into and out of a sleep state requires time (the paper cites a recent paper that found that typical machines take 3-8 seconds to enter “S3” sleep state, and 3-5 seconds to resume).
  2. Sleeping hosts cannot respond to network packets; hence, they can lose their network presence (e.g. DHCP lease can expire and be reassigned to another host). They also cannot run periodic tasks (e.g. backup or virus scanning).

A naive approach would put idle nodes to sleep, and then awaken them (via established “wake-on-LAN” support) when a packet is delivered to the node. This is insufficient: the authors demonstrate that in both home and office environments, the inter-arrival time of packets destined for idle computers is too small, so the cost of transitioning into and out of sleep state would negate any significant power savings. To avoid this problem, prior work has proposed using a proxy to handle network traffic intended for a sleeping node. The proxy can either ignore the traffic (if appropriate); handle the packet itself (e.g. by responding to an ARP query for the sleeping node’s IP), or it can awaken the sleeping node and forward the packet to it, if necessary. The effectiveness of proxying therefore depends on most traffic for an idle node fitting into the first two categories.

The paper is an empirical study of 250 machines owned by Intel employees, to assess the need for proxying in practice; the potential benefit of proxying; the network traffic that must be handled by a proxy; and so on.

Summary of Findings

  • Most machines are idle, most of the time — on average, machines were idle 90% of the time.
  • Home and office idle network traffic is markedly different, in both inter-arrival time of packets for idle machines (offices have more such traffic), and the nature of this traffic.
  • A power-saving proxy would need to handle broadcast, multicast, and unicast traffic to achieve significant gains. However, broadcast and multicast traffic represents “low-hanging fruit”: a proxy that handles only broadcast and multicast traffic would recover 80% of the idle time in home environments, and over 50% of the idle time in office environments.
  • For broadcast, address resolution (ARP, NBNS) and service discovery (SSDP for UPnP devices) protocols are the dominant sources of traffic for idle nodes. Both kinds of traffic are easy to proxy.
  • For multicast environments, routing traffic (HSRP, PIM) is the dominant source of traffic for idle nodes in an office environment. In a home environment, service discovery (SSDP) is dominant.
  • Their results for unicast traffic are less clear. They argue that only outgoing TCP connections dominate unicast traffic for idle nodes, and that less than 25% of this traffic is the result of some action a node initiated before becoming idle (and hence might need to be maintained in the idle state). They argue that outgoing traffic initiated while the machine is idle can often be batched together, or avoided entirely.

2 Comments

Filed under Paper Summaries

“Scalable Application Layer Multicast”

Multicast is clearly an efficient technique for applications with one-to-many communication patterns, but the deployment of in-network multicast has been slow. Therefore, there have been a number of proposals for implementing multicast at the application level, as an overlay over the physical network. This is not as efficient as true in-network multicast (because the same message may be sent over the same link multiple times), but is much more flexible and easier to deploy.

Scalable Application Layer Multicast” is one such proposal for application-layer multicast via an overlay network; their proposed protocol is called NICE. Their focus is on applications that require low-latency delivery of relatively low-bandwidth data streams to a large set of recipients, although they argue that their techniques are also applicable to high-volume streams with some minor changes.

Network Topology

In NICE, nodes are arranged into layers; a single node can appear in more than one layer. Every node belongs to the lowest layer, L0. The nodes in a layer are arranged (automatically) into clusters of nodes that are “near” to one another (in terms of network distance/latency). Each cluster has a leader, which is the node in the “center” of the cluster (NICE tries to make the leader the node that has the smallest maximal latency to the other nodes in the cluster). Layer L1 consists of all the cluster leaders from L0; the clustering algorithm is applied to L1 in turn, yielding another set of cluster leaders which form L2, and so forth. The height of the tree is determined by the number of nodes and a constant k: each cluster contains between k and 3k-1 nodes.

To multicast a data message, a node forwards the message to every cluster peer in every layer in which the node belongs, except that a node never forwards a message back to the message’s previous hop.

Protocol

To join the multicast group, a node begins by contacting a designated node called the Rendezvous Point (RP). The RP is typically the root of the NICE tree. The joining node walks down the tree from the root, choosing the child node that is closest to it (lowest latency).

Cluster leaders periodically check whether the cluster size constraint (k <= size <= 3k-1) has been violated; if so, they initiate a cluster merge or split, as appropriate. Splitting a cluster into two clusters is done by trying to minimize the maximum of the radii of the resulting clusters.

All the nodes in a cluster periodically sends heartbeats to each of its cluster peers. This is used to detect node failures, and to update pair-wise latency information for nodes. If a cluster leader fails or deliberately leaves the NICE group, a new leader is chosen by the same heuristic (minimize maximum latency from new center to any cluster peer). A new leader may also be chosen if the pair-wise latencies in the cluster drift sufficiently far to make selecting a new leader justified. Also, each member of every layer i periodically probes its latency to the nodes in layer i+1 — if the node is closer to another i+1 layer node, it moves to the corresponding cluster in layer i.

Discussion

I didn’t see that the authors provided any grounds for choosing an appropriate k value (essentially the tree fan-in), which seems like it would be an important parameter.

1 Comment

Filed under Paper Summaries

“ExOr: Opportunistic Multi-Hop Routing for Wireless Networks”

Wired networks are traditionally viewed as a unicast medium: packets are sent from A to B. Because the underlying medium can have a degree of sharing (e.g. multiple hosts connected to the same Ethernet hub), steps must be taken to avoid interference between concurrent senders (e.g. CSMA).

In a wireless network, the medium is shared: any hosts in range of a radio transmission can receive it. If the goal is to use a traditional unicast routing protocol over a wireless link, this makes interference between senders a more challenging problem (e.g. as discussed in the MACAW paper). However, rather than viewing broadcast as an inconvenience we need to work around, the broadcast nature of wireless can also be leveraged to design new protocols that would be infeasible for a wired network.

A nice example of a broadcast-oriented protocol for wireless is “ExOr: Opportunistic Multi-Hop Routing for Wireless Networks“. In a traditional routing design, a sender chooses the next hop that should receive a packet, and then unicasts the packet to that destination. In ExOr, a sender broadcasts a batch of packets to a group of nodes simultaneously. The set of batch recipients coordinate to try to ensure that each packet in the batch is forwarded onward. The recipient of a packet is only chosen after the packet is sent, which allows ExOr to “opportunistically” take advantage of links that have high loss rates. ExOr assumes that node reception probabilities are mostly independent and mostly decrease with distance — both of which are probably pretty reasonable assumptions.

Design

The source collects a batch of packets destined for the same host, and then chooses a forwarding list for the batch. The forwarding list is a list of nodes, sorted by the expected cost of delivering packets from that node to the eventual destination node. The cost metric is similar to ETX (expected number of transmissions required to send the packet to the destination via unicast, including retransmissions); unlike ETX, it only considers the forward delivery probability. While it would be possible to including all possible recipient nodes in the forwarding list, this would increase the coordination cost among the forwarders, so ExOr only includes “likely” recipients in the list (estimated 10%+ chance of receiving a broadcast packet).

Each packet contains a batch map, which holds the sender’s estimate of the highest priority (according to the cost metric) node to have received each packet in the batch. When a node receives a packet, it uses the packet’s batch map to update its local batch map. This means that batch map information propagates through the nodes, carrying information about packet reception from high priority nodes to lower priority nodes.

After the source broadcasts a batch, each member of the forwarding list broadcasts, ordered by descending priority (ETX value). Each node broadcasts the packets it received, along with its updated batch map. Nodes coordinate to schedule their transmissions to try to avoid interference (e.g. by estimating when each node’s expected transmission time is, according to ETX value and batch map contents). The protocol continues cycling through the nodes in priority order. At the end of each cycle, the ultimate destination broadcasts its batch map 10 times; at the beginning of each cycle, the source resends packets that weren’t received by any node (by observing batch map contents). The ExOr scheme stops when 90% of the packets in a batch have been transferred, and uses a traditional routing policy to deliver the remainder of the batch.

Discussion

Overall, ExOr is a really neat idea. That said, it is at best a special-purpose solution, because of the high latency it incurs for batching and multiple transmission rounds. Similarly, the need for a split TCP proxy is pretty ugly. A complete solution to wireless routing would perhaps adaptively switch between latency-oriented and bandwidth-oriented routing techniques, depending on the nature of the traffic.

It seems arbitrary to me that ExOr works on a batch-by-batch basis—that almost seems like establishing a new TCP connection for each window’s worth of data. The amount of useful work done in each transmission round decreases, until the protocol imposes an arbitrary cutoff at 90% and switches to a traditional routing protocol. Instead, wouldn’t it be more sensible for ExOr to allow new packets to be inserted into the batch as the destination confirms the delivery of packets? This would essential emulate the “conservation of packets” principle from the Van Jacobson paper.

2 Comments

Filed under Paper Summaries

“White Space Networking with Wi-Fi like Connectivity”

White Space Networking with Wi-Fi like Connectivity” discusses how to achieve wireless networking using so-called “white space”: unused portions of the UHF spectrum (approximately 512-698 Mhz). This is attractive, because using the UHF spectrum can allow wireless networking over a relatively long range (1 mile or greater, compared to 300-600 feet for 802.11n). UHF signals are also better able to penetrate walls and buildings in urban environments.

The difficulty with using UHF spectrum is that the spectrum has already been assigned for use by two “incumbents”: analog TV and wireless microphones (although the use of the spectrum by analog TV should be reduced since this paper was written, due to the recent conversion of analog TV to digital). Per FCC rules, non-incumbent use of the UHF spectrum must avoid interfering with incumbent use. While TV signals are relatively stable, wireless microphone transmissions can begin without warning, which poses some challenges to providing wireless networking using the UHF spectrum. To gain improved throughput, the authors also suggest that multiple contiguous free channels should be aggregated together and used for networking.

To enable networking over UHF, several problems must be solved:

  • Spectrum Assignment: What portion of the 180 Mhz UHF band should be used for networking in a given locale? The UHF spectrum is typically “fragmented”: some portions of the spectrum are in use, while others are free. Furthermore, to increase throughput, multiple contiguous free channels should be used by a single network. The right spectrum to use will also change over time, e.g. as hosts move and wireless microphones are activated. The channel used must be free for both the AP and all clients, which is made challenging by the larger spacial extent of UHF networks.
  • AP Discovery: WiFi APs emit beacons every 100ms on the WiFi channel they are using. To find APs, clients can easily scan each channel, looking for AP beacons. This is more challenging for UHF, because there are more possibilities to scan (30 UHF channels, and 3 possible channel widths, yielding 84 combinations).
  • Handling Disconnections: If incumbent use of a channel is detected, networking transmissions must immediately stop using the channel (the authors found that even transmitting control packets can result in audible interference for wireless microphones using the channel). If clients and APs are prone to abruptly disconnect from a channel, a method is needed to choose another channel and continue communication.

Design

To solve the spectrum assignment problem, clients and APs exchange information about which channels are free near them. APs probe for new channels when they detect incumbent use on their current channel, or if they observe a performance drop. Channel probing is done by considering which channels are free for all clients and the AP, and by then estimating the available bandwidth on each channel.

To enable AP discovery, the authors propose some signal processing magic. They propose a technique called “SIFT” which essentially allows them to scan the spectrum range and cheaply determine the presence of an AP on a channel; they can then tune the radio transceiver to the identified channel, and decode the beacon packet as usual.

To handle disconnections due to incumbent use of the spectrum, they propose using a separate 5Mhz backup channel, which is advertised as part of AP beacon packets. If a client senses that a disconnection has occurred (e.g. because no data packets have been received recently), it switches to the backup channel, and both listens for and emits chirps. Chirps contain information about the available white spaces near the chirping node, which is used to pick a new channel for communication. APs periodically scan for chirps.

It’s possible that the backup channel is already in use by another incumbent. In this case, the client picks an arbitrary available channel as the secondary backup channel, and emits chirps on it. The AP periodically scans all channels to attempt to find such chirps. The same signal processing magic (SIFT) that is used to enable cheap AP discovery is used to make scanning all channels periodically feasible.

Related Reading

Matt Welsh discusses this paper on his blog. The slides for the SIGCOMM talk on this paper are also available online.

Leave a comment

Filed under Paper Summaries

“Interactive Wi-Fi Connectivity for Moving Vehicles”

Interactive Wi-Fi Connectivity for Moving Vehicles” proposes ViFi, a system for achieving connectivity via WiFi from moving vehicles.

A moving vehicle is in range of different WiFi base stations over time. Due to host mobility, the best base station to use will change, so the client must do “handoff” to move between the base stations as appropriate. The authors describe traditional handoff policies as “hard handoff,” because a client is only ever connected to one base station at a time. The authors present experimental and analytical results to suggest that any handoff policy based on connecting to a single base station will be outperformed by policy which opportunistically utilizes multiple base stations that are in-range at once. Utilizing multiple base stations at once is effective for two reasons:

  1. A client is often in range of multiple base stations, at least in the urban environments they study.
  2. Packet loss is roughly independent of senders and receivers. In the upstream direction, utilizing multiple BSs is effective because losses are roughly independent of base stations: a packet that is not received by one BS is likely to be received by at least one another BS. In the downstream direction, BS diversity is effective because downstream losses tend to be “bursty” and path-dependent, but receiver independent: while one BS will often fail to deliver a burst of packets, other BSs are likely to be able to send packets to the receiver.

Design

Therefore, the authors propose a scheme to allow WiFi clients to opportunistically utilize multiple in-range base stations. They assume that base stations have some ability to communicate, but do not have a high-bandwidth link. Hence, they focus on a scheme that allows the base stations to coordinate to enable base station diversity and avoids duplicate or missed packet deliveries, but requires minimal additional load on the inter-base station link and doesn’t hurt user-perceived latency. Their basic approach is to combine opportunistic reception (clients broadcast to all BSs in range) with probabilistic relaying (base stations compute relay probabilities independently and probabilistically).

In ViFi, each vehicle designates one nearby base station as the anchor, and the rest as auxiliaries. The anchor and auxiliary list is periodically broadcast by the vehicle via a beacon, which allows base stations to learn the identify and addresses of peer BSs.

Packet Transmission

The basic ViFi transmission algorithm is simple. To send a packet from the vehicle upstream, the receiver broadcasts to all in-range receivers. If the anchor receives the packet, it responds with an ACK broadcast. Any auxiliaries that overhear the packet broadcast wait for a short window; if they don’t see an ACK, they probabilistically relay the packet to the anchor via the inter-BS backplane. When the anchor receives relayed packets that it hasn’t already ACK’ed, it ACKs them. Finally, the receiver retransmits packets if it doesn’t receive an ACK within a certain period of time.

The transmission algorithm for downstream packets (base station => vehicle) works similarly (and almost symmetrically). The anchor broadcasts the packet to the vehicle. If the vehicle hears it, it broadcasts an ACK. If any auxiliaries overhear the broadcast but don’t see the ACK after a short window of time, they probabilistically broadcast the packet themselves. The anchor retransmits the packet if it doesn’t hear an ACK during the retransmission interval.

The magic is in the probabilistic retransmission technique, which must balance between retransmitting too often (wasting bandwidth) and not often enough (risking packet drops). The intuition behind their probabilistic retransmission technique is that they want the expected number of packet relays amongst all auxiliaries to be 1; amongst auxiliaries, they prefer transmission by auxiliaries that are better connected to the destination (i.e. either the anchor BS or the vehicle). Each BS computes packet reception probabilities between itself and the other BSs and the vehicle, using a periodic beacon.

Other Details

802.11 link-layer acknowledgments are sent immediately after a packet is delivered to the recipient, which allows the sender to predict the expected arrival time of the ACK. In ViFi, ACKs may be delayed due to relaying, which depends on current network conditions (e.g. behavior of the inter-BS backplane). Hence, ViFi uses its own acknowledgment scheme, with the retransmission interval set based on recent history (99th percentile measured delay for ACKs).

ViFi also implements salvage. When a vehicle moves out of range before the anchor BS can deliver packets for it, those packets are “stranded” on the old anchor BS. When the vehicle obtains a new anchor BS, the beacon it sends out also includes the previous anchor BS. The new anchor contacts the old anchor, and asks for any unacknowledged packets for the vehicle to be relayed.

Leave a comment

Filed under Paper Summaries

X-Trace

X-Trace: A Pervasive Network Tracing Framework” provides a tool for understanding the behavior of distributed systems composed of layers of protocols. Traditional logging and diagnostic tools operate at a single layer in the stack, for example by tracing the flow of HTTP or TCP traffic in a network. This is insufficient for understanding many realistic failure scenarios, because application traffic typically traverses many different layers and protocols: when a client initiates an HTTP request, the following might happen:

  1. A DNS lookup is performed (via UDP, perhaps requiring recursive resolution)
  2. A TCP connection is established to the remote server, which requires transmitting IP packets and numerous Ethernet frames across different network links.
  3. The remote server handles the HTTP request, typically by running various application code and contacting multiple databases (e.g. a single Google search request is distributed to ~1000 machines)
  4. The HTTP response is returned to the client; the contents of the response may prompt the client to issue subsequent requests (e.g. additional HTTP requests to fetch resources like images and external CSS)

A failure at any point in this sequence can result in a failure of the original action—hence, diagnosis tools that attempt to operate at a single layer of the stack won’t provide a complete picture into the operation of a distributed system.

Design

X-Trace works by tagging all the operations associated with a single high-level task with the same task ID. By modifying multiple layers of the protocol stack to record and propagate task IDs, all the low-level operations associated with a high-level task can be reconstructed. Furthermore, X-Trace also allows developers to annotate causal relationships, which allows a “task tree” of related operations to be constructed.

X-Trace metadata must be manually embedded into protocols by developers; protocols typically provide “extension”, “option”, or annotation fields that can be used to hold X-Trace data. The “trace request” (tagging all the operations associated with a task) is done in-band, as part of the messages sent for the task. Data collection happens offline and out-of-band. This makes failure handling easier, and allows the resources required for data collection to be reduced (e.g. using batching and compression). The downside to offline data collection is that it makes prompt diagnosis of problems more difficult.

Discussion

Overall, I think this is a really nice paper. The idea is obviously useful, and it is nicely explained.

The system appears to only have a limited ability to track causal relationships. In particular, situations involving multiple clients modifying shared state don’t appear to be supported very well. For example, suppose that request A results in inserting a row into a database table. Request B aggregates over that table; based on the output, it then tries to perform some action, which fails. Clearly, requests A and B are causally related in some sense, but X-Trace wouldn’t capture this relationship. Extending X-Trace to support full causal tracking would be equivalent to data provenance.

It would be interesting to try to build a network-wide assertion checking utility on top of the X-Trace framework.

2 Comments

Filed under Paper Summaries