When you’re designing a performance-sensitive computer system, it is important to have an intuition for the relative costs of different operations. How much does a network I/O cost, compared to a disk I/O, a load from DRAM, or an L2 cache hit? How much computation does it make sense to trade for a reduction in I/O? What is the relative cost of random vs. sequential I/O? For a given workload, what is the bottleneck resource?
When designing a system, you rarely have enough time to completely build two alternative designs to compare their performance. This makes two skills useful:
- Back-of-the-envelope analysis. This essentially means developing an intuition for the performance of different alternate designs, so that you can reject possible designs out-of-hand, or choose which alternatives to consider more carefully.
- Microbenchmarking. If you can identify the bottleneck operation for a given resource, then you can construct a micro-benchmark that compares the performance of different implementations of that operation. This works in tandem with your intuition: the more microbenchmarking you do, the better your intuition for system performance becomes.
Jeff Dean makes similar points in his LADIS 2009 keynote (which I unfortunately wasn’t able to attend). In particular, he gives a useful table of “Numbers Everyone Should Know” — that is, the cost of some fundamental operations:
Operation | Time (nsec) |
L1 cache reference | 0.5 |
Branch mispredict | 5 |
L2 cache reference | 7 |
Mutex lock/unlock | 25 |
Main memory reference | 100 |
Compress 1KB bytes with Zippy | 3,000 |
Send 2K bytes over 1 Gbps network | 20,000 |
Read 1MB sequentially from memory | 250,000 |
Roundtrip within same datacenter | 500,000 |
Disk seek | 10,000,000 |
Read 1MB sequentially from disk | 20,000,000 |
Send packet CA -> Netherlands -> CA | 150,000,000 |
Some useful figures that aren’t in Dean’s data can be found in this article comparing NetBSD 2.0 and FreeBSD 5.3 from 2005. Approximating those figures, we get:
Operation | Time (nsec) |
System call overhead | 400 |
Context switch between processes | 3000 |
fork() (statically-linked binary) | 70,000 |
fork() (dynamically-linked binary) | 160,000 |
Update: This recent blog post examines the question of system call and context switch overhead in more detail. His figures suggest the best-case system call overhead is now only ~60 nsec (for Linux on Nehelem), and that context switches cost about 30 microseconds (30,000 nsec) — when you account for the cost of flushing CPU caches, that is probably pretty reasonable.
In comparison, John Ousterhout’s RAMCloud project aims to provide end-to-end roundtrips for a key-value store in the same datacenter within “5-10 microseconds,” which would represent about a 100x improvement over the 500 microsecond latency suggested above.
The keynote slides are worth a glance: Dean talks about the design of “Spanner”, a next-generation version of BigTable he is building at Google. See also James Hamilton’s notes on the keynote, and Greg Linden’s commentary.
Pingback: jardenberg kommenterar – 2009-10-18 — jardenberg unedited
Pingback: Numbers Everyone Should Know | Utah Hadoop User Group
Reblogged this on Anand Jeyahar's blah..blah..radio static…..
Pingback: The Real Interview Questions- Hiring at Sumo Logic | Sumo Logic
Pingback: How to Rock a Systems Design Interview | yizhaoee
Reblogged this on Datapolitan and commented:
Great reference to have on hand for working with big data.
Reblogged this on jindongpu and commented:
Nice estimation on system performance w/ building the system.
Hello,
It’s good to see all those numbers laid out in a table.
I discovered that the real secret to speeding up Oracle databases was to reduce the number of reads to the hard drives. I did a presentation at the Oracle User’s Group showing all my techniques here:
http://rodgersnotes.wordpress.com/2010/09/14/oracle-performance-tuning/
The most dramatic increase in speed was from two hours, to four seconds, about 2000 times faster.
It would be good to see how much time it takes to run an IF statement, the UPPER function or a single line of code. I can’t tell you how many times I’ve seen “programmers” work hard to eliminate a line of code. But then, in their while loops, they read the hard disk every time! And then they wonder why their code is slow. Duh!
Best,
Rodger
Reblogged this on CS WORLD.
Pingback: 老中帮老中 | ggooggoo
Pingback: Interview Preparation – Zebing Lin's blog
Pingback: Reshared | Yi's Personal Notes
Reblogged this on Yi's Personal Notes.
Pingback: Numbers Everyone Should Know #JHedzWorlD | JHedzWorlD
Pingback: System Design Interview Prep Material | TechUtils.in
Pingback: Hello world! | Thoughts of Java Architect
Pingback: Interview Preparation | Learn for Master
Pingback: سلسة الإعداد لوظيفة مهندس برمجيات – الجزء٣ – خطة المذاكرة – للمستوى المتقدم – لعلي أصل
Pingback: سلسة الإعداد لوظيفة مهندس برمجيات – الجزء٣ – خطة المذاكرة – للمستوى المتقدم | لعلي أصل
Pingback: ABC: Always Be Coding - 裴俊东
Reblogged this on mrahmadawais.wordpress.com.
Pingback: Interview Preparation – Site Title
Pingback: New top story on Hacker News: Ask HN: What are the numbers everyone should know in 2019? – News about world
Pingback: New top story on Hacker News: Ask HN: What are the numbers everyone should know in 2019? – Hckr News
Pingback: New top story on Hacker News: Ask HN: What are the numbers everyone should know in 2019? – Golden News
Pingback: New top story on Hacker News: Ask HN: What are the numbers everyone should know in 2019? – Latest news
Pingback: New top story on Hacker News: Ask HN: What are the numbers everyone should know in 2019? – Outside The Know
Pingback: New top story on Hacker News: Ask HN: What are the numbers everyone should know in 2019? – World Best News
Pingback: Ask HN: What are the numbers everyone should know in 2019? – INDIA NEWS
Pingback: Ask HN: What are the numbers everyone should know in 2019? | My Tech Blog
Reblogged this on Black Coding Adventures.
Pingback: Numbers Everyone Should Know – Dinezh.com