“PortLand” begins with essentially the same motivation as SEATTLE: Ethernet is easy to manage but hard to scale to the size required by modern data centers, and IP is scalable but hard to manage. Given the choice between making Ethernet more scalable or making IP more manageable, both papers choose to tackle the former problem.
To make Ethernet more scalable, PortLand takes a classic approach: they simplify the problem by making assumptions. Using the observation that modern data center networks are typically organized into a “fat tree” or multi-rooted hierarchy, the authors use this assumption to make PortLand simpler and more efficient than a Layer 2 network that must handle an arbitrary topology. In PortLand, there are core, aggregation, and edge switches; these last are directly connected to end hosts. A “Location Discovery Protocol” (LDP) is employed by each switch to automatically determine its position in this hierarchy, and communicate that information to the other switches. Because LDP assumes that the network is a multi-rooted tree, it is quite a simple protocol.
Rather than using MAC addresses, end hosts are identified with Pseudo MAC addresses (PMACs). While a MAC merely identifies a host, a PMAC is both an identifier and a locator: that is, it encodes the location of the end host in the multi-rooted tree. This is the key to efficient routing: rather than needing to maintain forwarding tables for each MAC/PMAC in the network, switches can instead forward packets based on the PMAC prefix, as in IP. This significantly reduces the size of router forwarding tables. PMACs are translated back to MACs by switches before delivering packets to end hosts or sending them outside the PortLand fabric.
PortLand employs a centralized fabric manager to resolve ARP queries, and to simplify multicast and fault tolerance. ARP is handled by having edge switches forward the ARP request to the fabric manager. If the fabric manager does not have the PMAC for the requested IP, it broadcasts the ARP request down the fat tree, and caches the result. PMAC IP mappings are eagerly sent to the fabric manager as PMACs are assigned, so broadcasts should be relatively rare. When a VM is migrated to a new machine, a gratuitous ARP is sent to the fabric manager with the new IP to PMAC mapping. The fabric manager also sends an invalidation message for the IP/PMAC to the VM’s old switch. The fabric manager is made highly-available using asynchronous replication.
Fault tolerance is simplified by the fabric manager. Each switch sends a keepalive (LDP) message to its neighbors every 10ms; if a keepalive is not heard from a switch for 50ms, the switch is assumed to be failed, and the fabric manager is contacted. The FM updates its record of switch liveness, and then informs any switches affected by the failure. These switches recompute their forwarding tables based on the new network topology.
I thought the evaluation section was a little disappointing. The paper makes some disparaging comments about other approaches to solving this problem, such as SEATTLE and TRILL. However, the evaluation only examines the performance of PortLand, and doesn’t compare it to these alternatives. It is hard to say how important the concerns about SEATTLE and TRILL raised by the paper are, given the lack of empirical data.
The paper doesn’t detail their strategy for handling failures of the fabric manager. Since the FM doesn’t contain hard state, this is presumably not too complicated, but one wonders if simultaneous failures of the FM and one or more switches would significantly increase convergence time. Similarly, FM failure and failover is not addressed in the evaluation.
The requirement for a separate control network simply for communicating with the fabric manager is also unfortunate — it introduces a significant administrative headache, and seems hard to justify economically. If the fabric manager and the switches communicate over the data network, the paper doesn’t address how link or switch failures that impact connectively to the fabric manager will be handled.