Daniel Winterstein Daniel Winterstein - 12 days ago 6
SQL Question

PostgreSQL: performance impact of extra columns

Given a large table (10-100 million rows) what's the best way to add some extra (unindexed) columns to it?


  1. Just add the columns.

  2. Create a separate table for each extra column, and use joins when you want to access the extra values.



Does the answer change depending on whether the extra columns are dense (mostly not null) or sparse (mostly null)?

Answer

A column with a NULL value can be added to a row without any changes to the rest of the data page in most cases. Only one bit has to be set in the NULL bitmask. So, yes, a sparse column is much cheaper to add in most cases.

Whether it is a good idea to create a separate 1:1 table for additional columns very much depends on the use case. It is generally more expensive. For starters, there is an overhead of 28 bytes (heaptuple header plus item pointer) per row and some additional overhead per table. It is also much more expensive to JOIN rows in a query than to read them in one piece. And you need to add a primary / foreign key column plus an index on it. Splitting may be a good idea if you don't need the additional columns in most queries. Mostly it is a bad idea.

Adding a column is fast in PostgreSQL. Updating the values in the column is what may be expensive, because every UPDATE writes a new row (due to the MVCC model). Therefore, it is a good idea to update multiple columns at once.

Database page layout in the manual.

How to calculate row sizes: