The traditional assumption behind the design of TCP is that most packet loss is due to congestion; this is stated explicitly in the Van Jacobson paper, for example. While this assumption mostly holds on wired networks, it causes problems with wireless links: packets dropped or corrupted due to the nature of the wireless medium lead to (1) slow recovery times because of the need to wait for duplicate ACKs or the RTO (2) poor subsequent performance, because TCP assumes that the packet drop was caused by congestion and reacts accordingly (e.g. reduces the congestion window).
“A Comparison of Mechanisms for Improving TCP Performance over Wireless Links” describes and compares several solutions to this problem. They classify the candidate solutions into three groups:
- end-to-end protocols modify the TCP stack to make the sender of a dropped packet more intelligent, for example by adding selective ACKs (SACK) or explicit loss notifications (ELNs). The idea here is both to avoid waiting for TCP-level timeouts and to recover faster from packet loss. ELNs also allow the sender to differentiate between losses due to congestion and losses due to other factors (e.g. noise, interference), so that the TCP sender doesn’t take congestion-avoidance steps.
- link-layer protocols add reliability mechanisms to the link layer that understand the nature of the wireless medium. For example, the base station might require a link-level ACK from wireless clients. Because such a protocol is specific to a single link, congestion is not a factor, and the RTT can be estimated more accurately; this allows a lower retransmission timeout, and means that the sender won’t misinterpret packet drops as congestion. Link-layer protocols hide much of the link lossiness from TCP: the lossy link just appears to be a more reliable, lower-bandwidth link. A major issue here is ensuring that these protocols play nicely with TCP-level reliable delivery mechanisms. More advanced link-layer protocols have knowledge of TCP semantics: for example, they might detect and suppress TCP-level duplicate ACKs and instead retransmit lost packets themselves.
- split-connection protocols are the most exotic: the base station acts as the endpoint for the TCP session, and it establishes a separate session with the wireless host. This second session may or may not use TCP; in either case, the base station is in a better position to manage the wireless session, because it knows more about the medium, and knows congestion cannot occur; hence, it can schedule retransmissions more effectively. The two transport-layer sessions act to insulate the TCP sender from any losses suffered on the wireless link.
The authors conduct a rigorous series of experiments, comparing the performance of several different variations from each class of protocols. They use an 8MB data transfer as the workload, and measure both LAN and WAN environments (in the latter, the sender is 16 hops away from the base station). Notable conclusions:
- The authors observe a bad interaction between link-layer reliable delivery mechanisms and TCP. However, the problem does not lie in a simple mismatch between the TCP RTO and the link-layer RTO (the latter is much smaller than the TCP RTO of 500 msec). The problem is that the link-layer reliable delivery technique they used does not attempt to preserve delivery order; this yields duplicate ACKs from the destination TCP stack, and hence invokes TCP-level retransmission and recovery. They observed that 90% of the lost packets were redelivered by both the link-layer and TCP. This problem was solved by modifying the link layer to understand TCP semantics and suppress TCP-level duplicate ACKs for lost packets that the link-layer has already retransmitted.
- Basic TCP achieves very poor throughput, especially for WAN environments. ELNs and SACKs both result in significant improvements; the authors hypothesize that a protocol that combines ELNs and SACKs would be the most effective.
- They tested two variants of the “split connection” approach. When using TCP Reno for the wireless connection, they experienced poor performance: TCP Reno’s packet loss recovery is so slow that eventually the base station’s buffers are exhausted because packet loss limits goodput on the wireless link. They saw much better performance when they used SACKs for the wireless link, but even in this configuration, the split-connection approach did not significantly outperform the best reliable-delivery link-layer approach.
Using TCP Reno for the wireless half of the “SPLIT” configuration seems pretty silly: TCP Reno is close to the worse-performing protocol for wireless links, and the whole point of the split-connection approach is that the wireless link can use a more appropriate transport than TCP.