April 06, 2006

I’d like to highlight Tom Lane’s recent commit that fixes a major defect in the implementation of domains in PostgreSQL. Domains can have CHECK or NOT NULL constraints, but prior to Tom’s patch, these constraints were not enforced by PL/PgSQL, or as the return value of a procedural language function. (That meant you could have values of a domain type that violated the domain’s own constraints!) This has been a known problem for quite some time, but it was somewhat tricky to fix.

In Postgres, each type has an associated “input function” that takes a string and produces a value of the type. Prior to Tom’s commit, a domain type’s input function was just the input function of its base type. As a result, just invoking a type’s input function (which is done in several places throughout the backend) wasn’t enough to check the constraints on a value of a domain type—you also had to explicitly lookup any associated domain constraints and check them. So in all the places where we were invoking input functions we’d need to add some additional code to explicitly check domain constraints. Needless to say, that would be pretty ugly — it’s just a few additional function calls, but it’s really not something ought to be doing at every callsite of a type’s input function.

Worse still, looking up the constraints associated with a domain is relatively expensive (it requires a non-syscache’d catalog lookup). To actually evaluate a CHECK constraint you need to evaluate an expression, which requires instantiating a bunch of executor machinery, which is also not that cheap. So in all the places where we’d need to add checks of domain constraints, we’d also need to think about how to efficiently load and cache the domain constraints and executor machinery, and when to invalidate/release them.

I did some work to add domain constraint checking to the return value in PL/PgSQL. I never applied the patch, partly because there were some implementation details that were tricky to resolve, but mostly because it just seemed like the wrong approach.

Tom’s fix is much cleaner: by providing a separate input function for all domain types and doing the constraint checking there, we’re guaranteed to check domain constraints at the appropriate time, without the need to clutter each call-site of a type input function ([1], [2]). It would be worthwhile to investigate whether this results in a performance regression, though: there’s no easy way to cache the executor machinery needed to evaluate a CHECK constraint in this design, whereas the prior design allowed each call-site to implement its own EState caching.

Nice work, Tom!


Leave a comment

Filed under Advogato, PostgreSQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s