Numbers Everyone Should Know

When you’re designing a performance-sensitive computer system, it is important to have an intuition for the relative costs of different operations. How much does a network I/O cost, compared to a disk I/O, a load from DRAM, or an L2 cache hit? How much computation does it make sense to trade for a reduction in I/O? What is the relative cost of random vs. sequential I/O? For a given workload, what is the bottleneck resource?

When designing a system, you rarely have enough time to completely build two alternative designs to compare their performance. This makes two skills useful:

  1. Back-of-the-envelope analysis. This essentially means developing an intuition for the performance of different alternate designs, so that you can reject possible designs out-of-hand, or choose which alternatives to consider more carefully.
  2. Microbenchmarking. If you can identify the bottleneck operation for a given resource, then you can construct a micro-benchmark that compares the performance of different implementations of that operation. This works in tandem with your intuition: the more microbenchmarking you do, the better your intuition for system performance becomes.

Jeff Dean makes similar points in his LADIS 2009 keynote (which I unfortunately wasn’t able to attend). In particular, he gives a useful table of “Numbers Everyone Should Know” — that is, the cost of some fundamental operations:

Operation Time (nsec)
L1 cache reference 0.5
Branch mispredict 5
L2 cache reference 7
Mutex lock/unlock 25
Main memory reference 100
Compress 1KB bytes with Zippy 3,000
Send 2K bytes over 1 Gbps network 20,000
Read 1MB sequentially from memory 250,000
Roundtrip within same datacenter 500,000
Disk seek 10,000,000
Read 1MB sequentially from disk 20,000,000
Send packet CA -> Netherlands -> CA 150,000,000

Some useful figures that aren’t in Dean’s data can be found in this article comparing NetBSD 2.0 and FreeBSD 5.3 from 2005. Approximating those figures, we get:

Operation Time (nsec)
System call overhead 400
Context switch between processes 3000
fork() (statically-linked binary) 70,000
fork() (dynamically-linked binary) 160,000

Update: This recent blog post examines the question of system call and context switch overhead in more detail. His figures suggest the best-case system call overhead is now only ~60 nsec (for Linux on Nehelem), and that context switches cost about 30 microseconds (30,000 nsec) — when you account for the cost of flushing CPU caches, that is probably pretty reasonable.

In comparison, John Ousterhout’s RAMCloud project aims to provide end-to-end roundtrips for a key-value store in the same datacenter within “5-10 microseconds,” which would represent about a 100x improvement over the 500 microsecond latency suggested above.

The keynote slides are worth a glance: Dean talks about the design of “Spanner”, a next-generation version of BigTable he is building at Google. See also James Hamilton’s notes on the keynote, and Greg Linden’s commentary.

About these ads

8 Comments

Filed under Uncategorized

8 responses to “Numbers Everyone Should Know

  1. Pingback: jardenberg kommenterar – 2009-10-18 — jardenberg unedited

  2. Pingback: Numbers Everyone Should Know | Utah Hadoop User Group

  3. Pingback: The Real Interview Questions- Hiring at Sumo Logic | Sumo Logic

  4. Pingback: How to Rock a Systems Design Interview | yizhaoee

  5. Reblogged this on Datapolitan and commented:
    Great reference to have on hand for working with big data.

  6. Reblogged this on jindongpu and commented:
    Nice estimation on system performance w/ building the system.

  7. Hello,

    It’s good to see all those numbers laid out in a table.

    I discovered that the real secret to speeding up Oracle databases was to reduce the number of reads to the hard drives. I did a presentation at the Oracle User’s Group showing all my techniques here:

    http://rodgersnotes.wordpress.com/2010/09/14/oracle-performance-tuning/

    The most dramatic increase in speed was from two hours, to four seconds, about 2000 times faster.

    It would be good to see how much time it takes to run an IF statement, the UPPER function or a single line of code. I can’t tell you how many times I’ve seen “programmers” work hard to eliminate a line of code. But then, in their while loops, they read the hard disk every time! And then they wonder why their code is slow. Duh!

    Best,

    Rodger

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s