Internet Congestion Control M. Mathis Research Group Pittsburgh Supercomputing Center Internet-Draft March 4, 2009 Intended status: Informational Expires: September 5, 2009 Relentless Congestion Control draft-mathis-iccrg-relentless-tcp-00 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 5, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract Relentless congestion control is a simple modification that can be applied to almost any AIMD style congestion control: instead of Mathis Expires September 5, 2009 [Page 1] Internet-Draft Relentless Congestion Control March 2009 applying a multiplicative reduction to cwnd after a loss, cwnd is reduced by the number of lost segments. It can be modeled as a strict implementation of van Jacobson's Packet Conservation Principle. During recovery, new segments are injected into the network in exact accordance with the segments that are reported to have been delivered to the receiver by the returning ACKs. This algorithm offers a valuable new congestion control property: the TCP portion of the control loop has exactly unity gain, which should make it easier to implement simple controllers in network devices to accurately control queue sizes across a huge range of scales. Relentless Congestion Control conforms to neither the details nor the philosophy of current congestion control standards. These standards are based on the idea that the Internet can attain sufficient fairness by having relatively simple network devices send uniform congestion signals to all flows, and mandating that all protocols have equivalent responses to these congestion signals. To function appropriately in a shared environment, Relentless Congestion Control requires that the network allocates capacity through some technique such as Fair Queuing, Approximate Fair Dropping, etc. The salient features of these algorithms are that they segregate the traffic into distinct flows, and send different congestion signals to each flow. This alternative congestion control paradigm is described in a separate document, also under consideration by the ICCRG. The goal of the document is to illustrate some new protocol features and properties might be possible if we relax the "TCP-friendly" mandate. A secondary goal of Relentless TCP is to make a distinction between the bottlenecks that belong to protocol itself, vs standard congestion control and the "TCP-friendly" paradigm. Mathis Expires September 5, 2009 [Page 2] Internet-Draft Relentless Congestion Control March 2009 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Implementation . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Queue Controllers . . . . . . . . . . . . . . . . . . . . . . 6 4. Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Relentless is not at all AIMD-friendly . . . . . . . . . . 7 4.2. Performance Model . . . . . . . . . . . . . . . . . . . . 7 4.3. Long Fat Networks (LFN) . . . . . . . . . . . . . . . . . 7 5. Usage Example: remote video upload . . . . . . . . . . . . . . 8 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . . 10 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 Mathis Expires September 5, 2009 [Page 3] Internet-Draft Relentless Congestion Control March 2009 1. Introduction Relentless congestion control is a simple modification that can be applied to almost any AIMD style congestion control: instead of applying a multiplicative reduction to cwnd after a loss, cwnd is reduced by the number of lost segments. The resulting cwnd is normally the same as the actual window of data that was successfully delivered during the lossy RTT. If the application is keeping up with the network and other traffic does not change, this is the correct cwnd to exactly fill the network on subsequent RTTs. Relentless Congestion Control conforms to neither the details nor the philosophy of current congestion control standards [RFC2914]. These standards are based on the idea that the Internet can attain sufficient fairness by having relatively simple network devices send uniform congestion signals to all flows, and mandating that all protocols have equivalent responses to these congestion signals. To function appropriately in a shared environment, Relentless Congestion Control requires that the network allocates capacity through some technique such as Fair Queuing [FQ], Approximate Fair Dropping [AFD], etc. The salient features of these algorithms are that they segregate the traffic into distinct flows, and send different congestion signals (segment losses, ECN marks or Queuing delay) to each flow such that the network, and not the end-system, controls capacity allocation. This alternative congestion control paradigm is described in a separate document [unfriendly], also under consideration by the ICCRG. Relentless Congestion Control offers a valuable new property: the TCP portion of the control loop has exactly unity gain. This property is ideal from the standpoint of the queue controllers embedded within network devices -- if some flow is using 1% too much bandwidth, then dropping 1% of its segments will bring it back to the proper data rate one RTT later. Likewise if a flow is occupying 10 segments too much queue space, then dropping 10 segments will bring it back to the proper queue occupancy on the next RTT. This unity gain property should make it easier to implement simple queue controllers in network devices that accurately control queues across a huge range of scales. Note the duality here, Relentless Congestion Control both requires that the network manages traffic while at the same time, it makes it easier to do so. We do not claim that Relentless Congestion Control is ideal, or without its own set of problems. We are not even planning to standardize it at this time. The goal of the document is to Mathis Expires September 5, 2009 [Page 4] Internet-Draft Relentless Congestion Control March 2009 illustrate what new protocol features and properties might be possible if we relax the "TCP-friendly" mandate. In some data intensive communities, TCP has the reputation of being too slow to be useful. A secondary goal of Relentless TCP is to make a distinction between the bottlenecks that belong to protocol itself, vs standard congestion control and the "TCP-friendly" paradigm. Please send commentary on this draft to the ICCRG mailing list: iccrg@cs.ucl.ac.uk. 2. Implementation The simplest mental model of Relentless Congestion Control is a strict implementation of van Jacobson's Packet Conservation Principle [Van88]. During recovery, new segments are injected into the network in exact accordance with the segments that are reported to have been delivered to the receiver by the returning ACKs. As long as the network does not lose all segments or ACKs, and the SACK scoreboard [RFC2018] chooses appropriate data to retransmit, this will clearly converge in one round trip time to exactly the window successfully delivered during the lossy round trip. Since this quantity of data was successfully delivered during a lossy round trip it is both the largest justifiable estimate as well as the most accurate estimate of the maximum capacity of the network, and a good starting point for subsequent RTTs. Algorithms that open the window, Slow-Start and Congestion Avoidance, are only permitted when the receiver is reporting contiguous data, which implies that there has subsequently been a full RTT without any additional losses. Our implementation is not quite as elegant, and consequently is not as accurate as the mental model. It is based on an existing TCP implementation that supports plugable congestion control modules. Unfortunately, some of the cwnd adjustments are wired into the loss recovery code and can not be suppressed directly. They can only be offset later in the plugable Relentless Congestion Control module. For details and source code, see the web site [DOWNLOAD]. As a network safety and possible deployment strategy, our implementation requires that the traffic be marked with the "Lower Effort" (LE) DSCP [RFC3662]. In portions of the Internet where LE is supported, this permits Relentless TCP to co-exist with standard AIMD TCP using default Best Effort (BE) service, without significantly perturbing the latter. There may be concerns about the consequences of publishing our code. However, any competent kernel programmer can trivially reconstruct Mathis Expires September 5, 2009 [Page 5] Internet-Draft Relentless Congestion Control March 2009 our work from this document. The Relentless algorithm is substantially simpler than the carefully tweaked algorithms that it replaces. The majority of the complexity of our implementation comes from preserving compatibility with other plugable congestion control algorithms. 3. Queue Controllers There has been a lot research into how to optimally manage queues, starting with Sally Floyd's paper on Random Early Detection [RED]. None of the proposed solutions is entirely satisfactory. The difficulty is that to optimally control traffic from AIMD congestion control requires that the queue controller estimate some parameters of the traffic, such as effective RTT or cwnd. These parameters are needed to estimate the drop rate required to attain a specific data rate. Furthermore, in high speed devices the calculations have to run at line rate in silicon. So far all known solutions are, at best, engineering compromises, and are suboptimal in environments that are significantly different than their design point. We claim that simplifying TCP's response to congestion will make it much easier to implement simple lightweight queue controllers that can accurately manage queue lengths across a huge range of network scales. For example we define a simple "Baseline Controller" to manage the length of a queue. It would monitor the minimum queue length, and periodically randomly discard or ECN mark as many packets as the minimum queue length was above the set point during the previous interval. This keeps bringing the minimum queue length down to the set point. The periodic queue management interval can be any convenient interval somewhat longer than one RTT. (e.g. 1 to 10 times the RTT). Although the accuracy of this controller improves when it's polling interval is better matched to the actual RTT, it clearly is not very sensitive to mismatch. When the polling interval is 10 times the RTT, its average error would only be 5 packets, as long as the flow was Relentless TCP. 4. Properties Note that Relentless TCP has not been exhaustively tested yet. This section reflects early testing and analysis of the algorithm. Our results will improve. Mathis Expires September 5, 2009 [Page 6] Internet-Draft Relentless Congestion Control March 2009 4.1. Relentless is not at all AIMD-friendly Not too surprisingly, Relentless TCP does not share the network very well with AIMD TCP if both see the same average loss rate because there is no mechanism to isolate the flows. 4.2. Performance Model The traditional 1/sqrt(p) TCP performance model [MSMO97] will not apply. A simple analysis suggests that in steady state the Relentless TCP data rate will be something like MSS/(3*RTT*p). In an absolutely quiet network with drop tail queues you expect there to be one loss per three round trips: when the bottleneck queue is full and congestion avoidance increments the window, it will take one RTT before the sender observes the subsequent loss, one RTT to repair the loss, and one RTT of congestion avoidance to reach the next cwnd increment. In real networks the losses are observed to be much more bursty: clusters of losses followed by multiple loss-less RTTs. Our implementation is further hampered by the need to offset cwnd adjustments that occur in the mainline TCP code as a side effect of processing partial ACKs and and other events. We overcome these adjustments in the plugable congestion control module by setting ssthresh to the old cwnd minus losses, thus there is often an interval of slowstart following recovery. We strive to understand and further eliminate these situations where the mainline code differs from the ideal packet conservation principle. 4.3. Long Fat Networks (LFN) In our moderately large window tests (a Gigabit network with a 70 ms RTT) we were able to routinely attain better than 500 Mb/s even though the loss rate was in excess of 0.07%. We know from the performance model that attaining this rate with standard AIMD congestion control requires a loss rate that is less than 0.00003%. Clearly Relentless TCP can tolerate many orders of magnitude higher loss rates than other TCPs. However the packet traces reveal several anomalies not directly related to Relentless Congestion Control, including spurious TCP retransmissions, intermittent massive packet reordering, and various unexpected limitations on data transmissions, when there seems to be plenty of available window. This raises an interesting side issue: Relentless TCP may exercise the SACK and recovery code several order of magnitude harder than other congestion control algorithms. We may find that we need to test this code much more thoroughly than it has been needed in the past. Mathis Expires September 5, 2009 [Page 7] Internet-Draft Relentless Congestion Control March 2009 5. Usage Example: remote video upload To help understand how Relentless TCP makes traffic management easier, consider the following thought experiment: You are trying to upload digital video from a remote site through a high delay satellite link, shared by some small amount of other traffic. With either Fair Queuing or Lower Effort service in the ground station it can maintain separate queues for the TCP video upload and the smaller flows. As long as the satellite link has fewer losses than one per several RTTs, Relentless TCP can maintain a standing queue of video data at the ground station. As long as the ground station has queued video data, it can optimally allocate the link to the small flows first (according to an explicit policy) and completely fill the rest of the link with video. The small flows see minimal delays while the link is 100% utilized. To make this example more concrete, imagine that you are trying to fill 10 Mb/s satellite link with a 1500 Byte MTU and a 600 ms RTT. It would require about 500 packets to fill this path. With Relentless TCP and a Baseline queue controller using a 1 second sample interval, you would expect on average to see about one drop every 2 sample intervals (roughly once per 3 RTTs, or about every 1500 packets). To mask the noise caused by the small flows, the baseline controller could either have an extra long sample interval or slightly higher setpoint. Failing this only compromises the extent to which the system can compensate for the jitter introduced by the small flows, but not its ability to fill most of the link using a very straightforward configuration. Contrast this to the situation where TCP is using standard AIMD congestion control: the queue at the bottleneck needs to hold 500 packets, and TCP cwnd needs to sawtooth between 500 and 1000 packets. There needs to be one loss every 1000 RTTs (twice 500 due to delayed ACK), or about every 700,000 packets -- 500 times lower loss rate! To do this optimally, the queue controller has to estimate when the queue is holding slightly more than half of TCP's total window, so a single lost packet will not cause idle time, due to TCP having too small of window. If the queue controller gets it wrong, or if there are too many losses (say due to too high of satellite BER), then a single standard TCP flow can't fill the link. A delay sensing TCP might do substantially better than Reno TCP, since it might be able to sense the queuing delay at slightly above 500 packets and use it to control TCP's window without causing losses or a large queue, but it still requires a relatively low loss rate. (But note that a delay sensing Relentless TCP probably could do even better). Since Relentless TCP is implemented as an easily installed module and some form of Fair Queuing or Less Effort service is supported by many Mathis Expires September 5, 2009 [Page 8] Internet-Draft Relentless Congestion Control March 2009 routers, this video upload example can be deployed today, on an as needed basis. The next iteration of this note will document a demonstration setup. 6. Security Considerations The Relentless Congestion Control algorithm does not alter the basic security properties of the protocols in which it might be implemented, such as TCP and SCTP. However, it does provide an opportunity for a greedy user to preemptively use more than their TCP-friendly share of the network. We argue in a separate document [unfriendly] that the TCP-friendly capacity allocation paradigm needs to be replaced by one in which the network is explicitly responsible for capacity allocation. This paradigm shift would fully address this security concern. There is some concern that publishing our implementation [DOWNLOAD] will have undesirable consequences, however any competent kernel programmer can reconstruct our results from this note. 7. References [AFD] Pan, R., Breslau, L., Prabhakar, B., and S. Shenker, "Approximate fairness through differential dropping", SIGCOMM CCR 33(2), April 2003. [DOWNLOAD] Mathis, M., "Relentless Congestion Control", http://staff.psc.edu/mathis/relentless/, March 2009. [FQ] Demers, A., Keshav, S., and S. Shenker, "Analysis and simulation of a fair queuing algorithm", SIGCOMM 89, August 1989. [MSMO97] Mathis, M., Sempke, J., Mahdavi, J., and T. Ott, "The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm", SIGCOMM CCR 18(4), July 1997. [RED] Floyd, S. and v. Jacobson, "Random Early Detection (RED) gateways for Congestion Avoidance", IEEE/ACM Transactions on Networking 1(4), August 1993. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996. Mathis Expires September 5, 2009 [Page 9] Internet-Draft Relentless Congestion Control March 2009 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC 2914, September 2000. [RFC3662] Bless, R., Nichols, K., and K. Wehrle, "A Lower Effort Per-Domain Behavior (PDB) for Differentiated Services", RFC 3662, December 2003. [Van88] Jacobson, V., "Congestion Avoidance and Control", SIGCOMM 88, August 1988. [unfriendly] Mathis , M., "Rethinking TCP Friendly", draft-mathis-iccrg-unfriendly-00.txt (work in progress), March 2009. Appendix A. Acknowledgments This work is supported by a generous gift from Cisco System, Inc. Fred Baker, Nandita Dukkipati and others at Cisco have provided valuable input and inspiration on how to best frame the problem. John Heffner contributed sanity to early version of my ideas. Author's Address Matthew Mathis Pittsburgh Supercomputing Center 300 S. Craig St. Pittsburgh, PA 15213 Email: mathis@psc.edu Mathis Expires September 5, 2009 [Page 10]