Given a large table (10-100 million rows) what's the best way to add some extra (unindexed) columns to it?
A column with a
NULL value can be added to a row without any changes to the rest of the data page in most cases. Only one bit has to be set in the NULL bitmask. So, yes, a sparse column is much cheaper to add in most cases.
Whether it is a good idea to create a separate 1:1 table for additional columns very much depends on the use case. It is generally more expensive. For starters, there is an overhead of 28 bytes (heaptuple header plus item pointer) per row and some additional overhead per table. It is also much more expensive to
JOIN rows in a query than to read them in one piece. And you need to add a primary / foreign key column plus an index on it. Splitting may be a good idea if you don't need the additional columns in most queries. Mostly it is a bad idea.
Adding a column is fast in PostgreSQL. Updating the values in the column is what may be expensive, because every
UPDATE writes a new row (due to the MVCC model). Therefore, it is a good idea to update multiple columns at once.
How to calculate row sizes: