Manquer Manquer - 1 year ago 96
SQL Question

Set empty strings ('') to NULL in the whole database

In my database are many text columns where values are empty strings (

). The empty strings need to be set to
. I do not know the exact schemas, tables and columns in this database or rather I want to write a general solution which can be reused.

How would I write a query / function to find all text columns in all tables in all schemas and update all columns with empty strings (
) to

Answer Source

The most efficient way to achieve this:

  • Run a single UPDATE per table.
  • Only update nullable columns (not defined NOT NULL) with any actual empty string.
  • Only update rows with any actual empty string.
  • Leave other values unchanged.

This related answer has a plpgsql function that builds and runs the UPDATE command using system catalog pg_attribute automatically and safely for any given table:

Using the function f_empty2null() from this answer, you can loop through selected tables like this:

   _tbl regclass;
   FOR _tbl IN
      SELECT c.oid::regclass
      FROM   pg_class c
      JOIN   pg_namespace n ON n.oid = c.relnamespace
      WHERE  c.relkind = 'r'            -- only regular tables
      AND    n.nspname NOT LIKE 'pg_%'  -- exclude system schemas
      RAISE NOTICE $$PERFORM f_empty2null('%');$$, _tbl;
      -- PERFORM f_empty2null(_tbl);  -- uncomment to prime the bomb

Careful! This updates all empty strings in all columns of all user tables in the DB. Be sure that's what you want or it might nuke your database.

You need UPDATE privileges on all selected tables, of course.

As a child safety device I commented the payload.

You may have noted that I use the system catalogs directly, not the information schema (which would work, too). About this:

For repeated use

Here is an integrated solution for repeated use:

CREATE OR REPLACE FUNCTION f_all_empty2null(OUT _tables int, OUT _rows int) AS
   -- basic char types, possibly extend with citext, domains or custom types:
   _typ CONSTANT regtype[] := '{text, bpchar, varchar, \"char\"}';
   _sql text;
   _row_ct int;
   _tables := 0;
   _rows := 0;

   FOR _sql IN
      SELECT format('UPDATE %s SET %s WHERE %s'
                  , t.tbl
                  , string_agg(format($$%1$s = NULLIF(%1$s, '')$$, t.col), ', ')
                  , string_agg(t.col || $$ = ''$$, ' OR '))
      FROM  (
         SELECT c.oid::regclass AS tbl, quote_ident(attname) AS col
         FROM   pg_namespace n
         JOIN   pg_class c ON c.relnamespace = n.oid
         JOIN   pg_attribute a ON a.attrelid = c.oid
         WHERE  n.nspname NOT LIKE 'pg_%'      -- exclude system schemas
         AND    c.relkind = 'r'                -- only regular tables
         AND    a.attnum >= 1                  -- exclude tableoid & friends
         AND    NOT a.attisdropped             -- exclude dropped columns
         AND    NOT a.attnotnull               -- exclude columns defined NOT NULL!
         AND    a.atttypid = ANY(_typ)         -- only character types
         ORDER  BY a.attnum
         ) t
      GROUP  BY t.tbl
      -- Test
      -- RAISE NOTICE '%', _sql;

      -- Execute
      IF _sql IS NOT NULL THEN
         EXECUTE _sql;
         GET DIAGNOSTICS _row_ct = ROW_COUNT;  -- Report number of affected rows
         _tables := _tables + 1;
         _rows := _rows + _row_ct;
      END IF;
$func$  LANGUAGE plpgsql;


SELECT * FROM pg_temp.f_all_empty2null();


 _tables | _rows
 23      | 123456