I think it is somewhat common knowledge, at least among Postgres geeks, that the current implementation of on-disk linear hash indexes in PG is pretty poor. Some of the problems include:
- no write-ahead logging — if a system crash occurs, the index state may be inconsistent, so you’ll need to REINDEX
- poor bulk-loading performance — creating a hash index on a few gigabytes of data takes a long time
- doesn’t support unique indexes or multi-column indexes
But more importantly, the current hash index code doesn’t outperform our b+-tree implementation even for scalar equality (e.g. WHERE x = 'foo'). So there hasn’t been much motivation for folks to hack on hash indexes, as few people are actually using them.
In theory, though, hash indexes could be useful: equality scans are very common, and a hash index should be faster than a b+-tree for these queries at least in non-concurrent situations (since a b+-tree needs to navigate the internal nodes of the tree for a few levels before reaching the leaf level; a hash index can jump right to the appropriate hash bucket).
This topic was raised on the pgsql-general list, and a pretty interesting discussion ensued. We came up with two simple ways to improve hash indexes:
- at present, the content of a hash bucket is unordered. That means the index can use the hash to select the right bucket to scan, but needs to do a linear scan over all the bucket’s pages once it gets there, running the equality function of the index’s operator class for each the entry in the bucket.
It would be a lot faster to just binary search within a page, but to do that we need to either define an ordering over the index keys (which we may not be able to do for a user-defined type), or sort the keys by their hash value. If we do the latter, we probably need to store the full hash value of each key in the index. Storing the full hash value is useful for other reasons, as well: for example, if an entry’s hash value does not match the hash of the scankey, we needn’t evaluate the opclass’s equality function, since this tuple cannot match the scan.
- if we’re going to be storing the hash of each key in the index, it would sometimes be a good idea to only store the hash of the key, not the key’s value itself. To avoid hash collisions, we’ll need to recheck the key value against the heap tuple, but we need to do that anyway (PostgreSQL only stores MVCC information in the heap, so we need to check that the tuple pointed-to by the index entry actually exists).
Which was all well and good in theory, but after implementing it, it turns out that I’ve yet to find a case in which the performance is noticeably improved :-\ I could definitely construct a situation where the patch would be a win (e.g. a user-defined type with a complex equality operator, or an index on a very wide text field), but it is frustrating that it doesn’t appear to be a win in the common case (e.g. a hash index on an int4 field with a reasonably random distribution of values).
I’m still trying to figure out if there’s merit to pursuing these improvements, and if so, why I haven’t seen the performance improvement I expected. Anyway, this will teach me to look at profiling data before sitting down to code…