Numbers Everyone Should Know

When you’re designing a performance-sensitive computer system, it is important to have an intuition for the relative costs of different operations. How much does a network I/O cost, compared to a disk I/O, a load from DRAM, or an L2 cache hit? How much computation does it make sense to trade for a reduction in I/O? What is the relative cost of random vs. sequential I/O? For a given workload, what is the bottleneck resource?

When designing a system, you rarely have enough time to completely build two alternative designs to compare their performance. This makes two skills useful:

Back-of-the-envelope analysis. This essentially means developing an intuition for the performance of different alternate designs, so that you can reject possible designs out-of-hand, or choose which alternatives to consider more carefully.
Microbenchmarking. If you can identify the bottleneck operation for a given resource, then you can construct a micro-benchmark that compares the performance of different implementations of that operation. This works in tandem with your intuition: the more microbenchmarking you do, the better your intuition for system performance becomes.

Jeff Dean makes similar points in his LADIS 2009 keynote (which I unfortunately wasn’t able to attend). In particular, he gives a useful table of “Numbers Everyone Should Know” — that is, the cost of some fundamental operations:

Operation	Time (nsec)
L1 cache reference	0.5
Branch mispredict	5
L2 cache reference	7
Mutex lock/unlock	25
Main memory reference	100
Compress 1KB bytes with Zippy	3,000
Send 2K bytes over 1 Gbps network	20,000
Read 1MB sequentially from memory	250,000
Roundtrip within same datacenter	500,000
Disk seek	10,000,000
Read 1MB sequentially from disk	20,000,000
Send packet CA -> Netherlands -> CA	150,000,000

Some useful figures that aren’t in Dean’s data can be found in this article comparing NetBSD 2.0 and FreeBSD 5.3 from 2005. Approximating those figures, we get:

Operation	Time (nsec)
System call overhead	400
Context switch between processes	3000
fork() (statically-linked binary)	70,000
fork() (dynamically-linked binary)	160,000

Update: This recent blog post examines the question of system call and context switch overhead in more detail. His figures suggest the best-case system call overhead is now only ~60 nsec (for Linux on Nehelem), and that context switches cost about 30 microseconds (30,000 nsec) — when you account for the cost of flushing CPU caches, that is probably pretty reasonable.

In comparison, John Ousterhout’s RAMCloud project aims to provide end-to-end roundtrips for a key-value store in the same datacenter within “5-10 microseconds,” which would represent about a 100x improvement over the 500 microsecond latency suggested above.

The keynote slides are worth a glance: Dean talks about the design of “Spanner”, a next-generation version of BigTable he is building at Google. See also James Hamilton’s notes on the keynote, and Greg Linden’s commentary.

Numbers Everyone Should Know

32 responses to “Numbers Everyone Should Know”

Leave a comment Cancel reply

Twitter

About Me

Blogroll

Numbers Everyone Should Know

Share this:

Related

32 responses to “Numbers Everyone Should Know”

Leave a comment Cancel reply

Twitter

About Me

Blogroll