One of the classes I’m taking at Berkeley this fall is CS262a, which is the first part of their graduate-level introductory “systems” class — looking at great papers and common threads among operating systems, networking, databases, and the like. One of the first papers we’re going to discuss is “A History And Evaluation of System R“, which describes the seminal DBMS built by a team of 15 PhDs at IBM Research from 1974 to ~1980. The paper is a great read, especially if you’re interested in database internals. (If you’re going to read the paper, I suggest Joe Hellerstein’s annotated version, which contains a number of insightful comments.)
A few comments of my own:
- The scope of the project goals and the completeness of the implementation is remarkable, considering the time period and the lack of other production-quality RDBMS implementations at the time. System R included a cost-based query optimizer, joins, subqueries, updateable views, log-based crash recovery, granular locking, authentication and authorization, a relational system catalog, prepared queries, and other sophisticated features. In fact, System R even had the ability to automatically invalidate and replan prepared queries when their dependent objects changed, a feature Postgres didn’t add until 8.3 (and we still don’t have native support for updateable views).
- People often complain that SQL is a poorly-designed language. In many respects that may be true, but it’s not because the design of the language itself was neglected: even in 1975, the System R team gave “considerable thought … to the human factors aspects of the SQL language, and an experimental study was conducted on the learnability and usability of SQL.” While the goal of having secretaries and other non-technical staff writing SQL queries was perhaps not achieved, SQL wasn’t a hackishly-designed language, even if it sometimes feels that way 🙂
- The initial System R prototype supported subqueries, but not joins. That seems an unusual order in which to implement features, although it does make some sense (JMH points out that neglecting joins makes the optimizer search strategy much simpler).
- One interesting design choice is that System R generated machine code from the query plan, rather than having the executor walk the plan tree at runtime. While this design sounded exotic to me at first glance, it actually makes sense: on the hardware of the time, queries were much more likely to be CPU bound than they are today.
The notes from the 1995 System R reunion are also an interesting read, if you’d like to learn more about the politics and history of the project.