This paper describes the priorities that motivated the design of the TCP/IP protocol stack, and how those priorities were reflected by the actual protocol specifications. The top priority was resilience to faults: “communication must continue despite loss of networks or gateways.” The second and third priorities were support for different networking services (e.g. TCP vs. UDP) and different types of underlying networks, respectively. In contrast, cost effectiveness and resource accounting were considered by the designers, but were deliberately of less importance. These priorities led to a design in which IP is the basic lingua franca, and provides a datagram-oriented unreliable delivery service. More sophisticated protocols can be layered on top of IP (such as reliable delivery and stream-oriented connections), perhaps the canonical example of the end-to-end argument. Similarly, because IP only require best-effort datagram delivery, it can be implemented on top of a wide variety of network technologies (Ethernet, wireless, satellite, etc.)
Using an unreliable datagram-oriented protocol like IP also simplifies fault tolerance, because it means that intermediate network hops do not need to store connection state. When a hop fails, another route to the destination can be chosen and the connection can be maintained, without needing to recover any state from the failed node. That is, all the state about a connection is maintained by the connection endpoints. The paper coins the term “fate-sharing” to describe this approach: if a connection endpoint fails, it is acceptable to lose the state associated with the connection itself. (Note that this design does not mean that the network core does not need to maintain any state at all — it just doesn’t need to track per-connection state.)
Maintaining connection state only at the endpoints does have some disadvantages. The paper discusses several:
- Intermediate network hops only see individual datagrams, not flows/connections. This makes it difficult for those nodes to do resource management and accounting: rather than making decisions about each connection as a whole, they must make resource management decisions about each packet in isolation.
- Pushing complexity to the network endpoints increases the difficulty of writing new implementations of the protocol stack, because it encourages “host-resident” algorithms. The end-to-end principle suggests that much of this complexity is difficult to avoid anyway, because both endpoints must be involved to implement important functionality.
- Host-resident algorithms might also damage the robustness of the network if an individual host misbehaves (intentionally or not).
- Because the protocol is connection-oriented, it is easier to implement resource management and accounting.
- Individual packets can be smaller: because each virtual circuit uses the same route, individual packets don’t need to be annotated with addressing/routing information.
- Network hops can be more efficient, because network routes are calculated on a per-connection basis, rather than routing each packet individually.
I thought the paper’s speculation about replacing datagrams with “flows” for a next-generation network architecture was interesting:
[a flow would] identify a sequence of packets traveling from the source to the destination, without assuming any particular type of service with that service … It would be necessary for the gateways to have flow state in order to remember the nature of the flows which are passing through them, but the state information would not be critical in maintaining the desired type of service associated with the flow.
I was confused by how this would be different than simply building routers which track per-flow state as well as they are able (AFAIK this is done today), but without changing the network protocol itself. It would be good to clarify this proposal in class.
The paper observes that by encouraging host-resident algorithms, a malicious or misconfigured host can harm the robustness of the network as a whole. This is a symptom of a larger problem: the Internet design generally assumes that system administrators can be trusted (witness the constant stream of TCP security vulnerabilities and DOS attacks). By pushing complexity to the communication endpoints and implementing a “dumb network”, the Internet design makes network policy and resource management decisions more difficult.