user3231055 user3231055 - 1 year ago 58
SQL Question

Is it possible to make index per primary key in postgresql?

Consider I have a table, where insertions are very often with two fields:

user_id uuid,
date timestamp

Also I have an ordinary b-tree index on (user_id, date).

The problem with such approach is that insertions to different user_ids can not be done in parallel by postgres, because of index, which must be updated in sequence, since it is a tree, which can be rebalanced after each insertion so it must wait until each particular insertion finish.

What I want is the independent index per user_id, so that insertions can be done in parallel. Is there a way to do it?

------- EDITED: perfect answer by Laurenz Albe is below

Answer Source


You underestimate the power of B-tree indexes.

Multiple inserts on a B-tree index can run in parallel, and the tree never gets rebalanced. Instead, you occasionally have an index page split that will only block operations on that page for a short while.


The algorithm for splitting pages is described in the famous paper of Lehman and Yao (a must-read for anybody interested in index internals), and the nbtree README from the PostgreSQL source describes further details like how deletions are handled.

Short description of the insertion algorithm:

As long as an index page is not full, new entries just get inserted. This causes only a brief lock on the index page.

If a page is full, it is split using the Lehman & Yao algorithm, which locks at most three pages at a time. This split requires a new entry to the newly created page in the parent page, so that page might also have to be split, possibly recurring up to the root page.

Still, no more than three locks are required, since these operations occur one after the other.

Note that such a root page split occurs only 3-4 times during the life time of an index, since few indexes are more than 5 levels deep.

This way, all branches of a B-tree index have the same depth, so the index is always balanced and does not need rebalancing. Rebalancing could only be interesting during deletion of entries, but PostgreSQL doesn't do that (except that it reclaims index pages when they become totally empty).

Other notes concerning your question:

Using multiple indexes like you propose would not make this any faster – it would make things much more complicated and slow if you had to create an index for each user_id, and such indexes couldn't be used for a search anyway.

Nonetheless, indexes slow down inserts consierably. That's a problem you cannot avoid if data are inserted and queried simultaneously. If nobody queries the data while you do a bulk insert, you can drop the index and recreate it afterwards.