Serializable Snapshot Isolation

This semester at Berkeley, I’m taking CS286, the graduate-level data management course. In today’s class, we discussed a paper that I thought might be of particular interest to Postgres hackers: “Serializable Isolation for Snapshot Databases“, by Cahill, Rohm and Fekete from SIGMOD 2008.

The paper addresses a well-known problem with snapshot isolation (SI), which is the isolation level that Postgres actually provides when you ask for “SERIALIZABLE” isolation. SI basically means that a transaction sees a version of database state that corresponds to the effects of all the transactions that were committed before it began; it also sees the effects of its own updates. This is not equivalent to true serializability, however: that is, the database system can provide snapshot isolation, and yet still allow a concurrent transaction schedule that is not equivalent to some serial (one-at-a-time) transaction schedule.

To see why this is true, consider two concurrent transactions that both examine the state of the database, and then perform a write operation that reflects the values that they just read. The paper provides a simple example: suppose we have a database that describes the doctors in a hospital. We have a program that wants to move doctors from “on-call” to “off-duty”, as long as there is at least one other doctor that is on-call. It’s easy to see that if there are two doctors and we run two instances of the program concurrently, under SI rules we could end up with zero doctors on duty. This violates serializability: there’s no serial schedule of these two transactions that could yield this erroneous database state.

The paper proposes a relatively simple modification to snapshot isolation that avoids these problems, by detecting a superset of the dangerous situations and aborting one of the transactions involved. I’ll leave the details of their technique and the underlying theory to the paper, but it’s very readable.

So, should we implement their technique in Postgres? It’s an interesting idea, but the implementation cost would be very non-trivial. Despite the paper’s claim that it imposes relatively little overhead on a traditional SI implementation, it would require basically tracking the set of rows each transaction has read, and keeping that information around for a bounded time period after the transaction has committed. I think that the performance costs of doing that naively would be too expensive for this to be feasible. Perhaps a cheaper implementation is possible (e.g. by tracking page-level reads rather than record-level reads)?

As an aside, it is somewhat bogus for PostgreSQL to provide snapshot isolation when the user asks for serializability; it is also a violation of the SQL standard, I believe. That said, Oracle does the same thing, so at least we’re not alone, and it’s hard to see a practical improvement. The relevant section of the docs could certainly make this point clearer, however.

Advertisements

2 Comments

Filed under Advogato, PostgreSQL

2 responses to “Serializable Snapshot Isolation

  1. Parul

    hi!
    your analysis of paper was very interesting.
    can u just guide me for the literature i can refer to know more about the suggestion you have given of ‘tracking page-level reads’.
    hoping for a reply

  2. Well, the algorithm they propose requires tracking the set of rows read by each transaction. That would be very expensive for many applications. It would be somewhat cheaper to instead only record that a transaction has read something from a particular page, and not identify the row in question. This would only help marginally though.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s