“Understanding TCP Incast Throughput Collapse”

Understanding TCP Incast Throughput Collapse in Datacenter Networks” is another paper that discusses the “TCP incast” phenomenon. This paper has a number of differences with the earlier CMU study on TCP Incast, in both their experimental configuration and their results:

  • In the CMU paper, as the number of senders increases, the per-sender fragment size decreases. This simulates a fixed-size data block being striped over a variable number of servers. In the Berkeley paper, the fragment size is fixed, which essentially means that the simulated block size varies with the number of servers. It is debatable which of these scenarios is more realistic.
  • The authors found that disabling delayed ACKs was actually harmful to performance, for both the fixed-fragment and fixed-block size workloads. The authors argue that this is because it “overdrives” the TCP congestion window and causes unnecessary congestion.
  • Using a 200 usec RTO, as suggested from the CMU paper, was found to lead to poor performance for both workloads. The author’s explanation for this is similar to the delayed ACK phenomena: the RTT (as estimated by TCP) for their network was approximately 2 msec. Therefore, using a 200 usec RTO leads to spurious retransmissions, which is similar to the retransmits due to congestion when delayed ACKs are disabled. The difference between these results and those of the CMU paper appears to be largely due to the difference in RTT: CMU’s baseline RTT was only 100 usec.
  • The authors found more complex behavior for the fixed-fragment workload than for the variable-fragment workload used by CMU: in the former, as the number of servers increases, goodput is initially high, then catastrophically low; it then rises to a peak below the initial goodput peak, and then gradually falls off.

Comments

The critical graphs in this paper (Figures 6 and 11) are very difficult to read.

I was confused as to why the authors even tried to use an RTO of 200 usec on a network with a 2 msec RTT. The change proposed by the CMU paper is to reduce the lower bound on the RTO to 200 usec, they don’t suggest a fixed RTO of 200 usec. That is, the Jacobson RTO estimation method should still be used: RTO is the minimum of 200 usec and the smoothed RTT estimate plus 4 times the linear deviation. Hence, using an RTO of 200 usec on a network with a TCP-estimated RTT of 2 msec seems like an inaccurate representation of the CMU proposal.

The author’s argument that disabling delayed ACKing results in overdriving the congestion window is convincing, but they didn’t give an intuition for why this behavior occurs. Is the connection between a lack of delayed ACKs and overdriving the congestion window inherent to TCP, or specific to their experimental configuration?

See also

Advertisements

4 Comments

Filed under Paper Summaries

4 responses to ““Understanding TCP Incast Throughput Collapse”

  1. Yanpei Chen

    we used 200 usec as a lower bound. if u missed that, we didn’t do a good enough job of making it clear.

    the intuition re delayed ACKs is the same as the long fat pipe intuitions. we’ll talk about that in class. again, if u missed that in the paper, we didn’t do a good enough job of making it clear.

  2. Neil Conway

    Yanpei, thanks for the feedback. My confusion about the lower bound on RTO arose partly because of the labels on Figure 6 (e.g. “Low res timer, 1 ms RTO” rather than “1 ms min RTO), as well as some of the text in Section 5.2, which uses phrases like:

    For vanilla TCP with an RTO timer value of 200ms, disabling delay ACKs lead to a slight improvement.

  3. Yanpei Chen

    Interesting … we dropped the “min” going from Figure 5 to Figure 6. Yea a large chunk of Section 5 was written in a great hurry to meet the deadline. We had a ton of last minute, pretty important results that basically said Incast is not solved despite what’s coming in SIGCOMM 2009.

  4. Nick L

    I was also confused and thought they had set RTO to be fixed at 200µs. As it stands, it’s quite interesting that basically the same approach as the CMU paper gives very different results.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s