“Looking Up Data In P2P Systems”

Looking Up Data In P2P Systems” is a CACM article that provides a brief, high-level overview of research on DHT systems in the 2003 timeframe. Their focus is on Chord, Pastry, and similar systems, that implement a DHT via an overlay network, and are primarily intended for WAN environments.

The paper focuses on the “lookup problem“: searching for a data item in a distributed system without using centralized servers or a hierarchical server structure. The paper’s argument is that requiring centralized or privileged nodes inhibits scalability, because it load certain servers more than others. To avoid the need for centralized resources, P2P lookup systems typically rely on mapping lookup keys into a numeric ID space, typically via hashing. The lookup protocol then uses the local “neighbor” information cached at each node to successively navigate toward nodes that hold keys in the ID space that are “closer” to the lookup key, using a distance metric defined by each protocol. An important real-world optimization is to make the P2P system’s notion of “distance” correspond roughly to network distance, to reduce the number of physical network hops that must be traversed to perform a lookup. P2P lookup systems differ in the amount of state they keep about their neighbors, how the space of nodes is arranged (1-D ring, 2-D space, etc.), the distance metric used, whether the system attempts to protect against malicious nodes joining the group, and so forth.

Discussion

P2P systems clearly have a role to play, but the hype around this topic has died down considerably since 2003. Modern DHTs are typically employed inside a handful of datacenters, and hence use different algorithms than the WAN-oriented protocols discussed by this paper (e.g. Dynamo is a well-known DHT for datacenter use designed by Amazon).

The problem with treating all the nodes in a P2P system as true “peers” is that it assumes that each node has approximately equal connectivity and resources. This is unlikely to be true, especially in a WAN environment. Hence the use of “SuperNodes” in Kazaa, and other practical systems: to achieve good performance, a P2P system should take advantage of nodes with better connectivity and resources. This amounts to constructing a hierarchical arrangement on-the-fly, but there’s nothing inherently “unscalable” about that. It is interesting that this line of P2P work explores the extreme “decentralized / symmetric” end of the spectrum, but I think most practical systems will incorporate some degree of asymmetry.

Related Reading

The Surprising Persistence of Peer-To-Peer discusses the author’s view that P2P research has remaining “surprisingly” relevant and active in recent years.

Advertisements

2 Comments

Filed under Paper Summaries

2 responses to ““Looking Up Data In P2P Systems”

  1. hmmm … do you think Dynamo should replace Chord in future 168’s? The Chord paper is one of the most referenced papers in the CS field.

  2. neilconway

    Well, I think everyone with an interest in this stuff should read both the Chord and Dynamo papers. Chord is usually on the 262A reading list, so many students will need to read it anyway. I personally didn’t get much out of the CACM article described above, so perhaps that could be replaced with the Dynamo paper?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s