“Looking Up Data In P2P Systems” is a CACM article that provides a brief, high-level overview of research on DHT systems in the 2003 timeframe. Their focus is on Chord, Pastry, and similar systems, that implement a DHT via an overlay network, and are primarily intended for WAN environments.
The paper focuses on the “lookup problem“: searching for a data item in a distributed system without using centralized servers or a hierarchical server structure. The paper’s argument is that requiring centralized or privileged nodes inhibits scalability, because it load certain servers more than others. To avoid the need for centralized resources, P2P lookup systems typically rely on mapping lookup keys into a numeric ID space, typically via hashing. The lookup protocol then uses the local “neighbor” information cached at each node to successively navigate toward nodes that hold keys in the ID space that are “closer” to the lookup key, using a distance metric defined by each protocol. An important real-world optimization is to make the P2P system’s notion of “distance” correspond roughly to network distance, to reduce the number of physical network hops that must be traversed to perform a lookup. P2P lookup systems differ in the amount of state they keep about their neighbors, how the space of nodes is arranged (1-D ring, 2-D space, etc.), the distance metric used, whether the system attempts to protect against malicious nodes joining the group, and so forth.
P2P systems clearly have a role to play, but the hype around this topic has died down considerably since 2003. Modern DHTs are typically employed inside a handful of datacenters, and hence use different algorithms than the WAN-oriented protocols discussed by this paper (e.g. Dynamo is a well-known DHT for datacenter use designed by Amazon).
The problem with treating all the nodes in a P2P system as true “peers” is that it assumes that each node has approximately equal connectivity and resources. This is unlikely to be true, especially in a WAN environment. Hence the use of “SuperNodes” in Kazaa, and other practical systems: to achieve good performance, a P2P system should take advantage of nodes with better connectivity and resources. This amounts to constructing a hierarchical arrangement on-the-fly, but there’s nothing inherently “unscalable” about that. It is interesting that this line of P2P work explores the extreme “decentralized / symmetric” end of the spectrum, but I think most practical systems will incorporate some degree of asymmetry.
The Surprising Persistence of Peer-To-Peer discusses the author’s view that P2P research has remaining “surprisingly” relevant and active in recent years.