Ian Storm Taylor Ian Storm Taylor - 4 years ago 74
SQL Question

How to generate a random, unique, alphanumeric ID of length N in Postgres 9.6+?

I've seen a bunch of different solutions on StackOverflow that span many years and many Postgres versions, but with some of the newer features like

I want to ask again to see if there is a simpler solution in newer versions.

Given IDs which contain
a-zA-Z0-9
, and vary in size depending on where they're used, like...

bTFTxFDPPq
tcgHAdW3BD
IIo11r9J0D
FUW5I8iCiS

uXolWvg49Co5EfCo
LOscuAZu37yV84Sa
YyrbwLTRDb01TmyE
HoQk3a6atGWRMCSA

HwHSZgGRStDMwnNXHk3FmLDEbWAHE1Q9
qgpDcrNSMg87ngwcXTaZ9iImoUmXhSAv
RVZjqdKvtoafLi1O5HlvlpJoKzGeKJYS
3Rls4DjWxJaLfIJyXIEpcjWuh51aHHtK


(Like the IDs that Stripe uses.)

How can you generate them randomly and safely (as far as reducing collisions and reducing predictability goes) with an easy way to specify different lengths for different use cases, in Postgres 9.6+?

I'm thinking that ideally the solution has a signature similar to:

generate_uid(size integer) returns text


Where
size
is customizable depending on your own tradeoffs for lowering the chance of collisions vs. reducing the string size for usability.

From what I can tell, it must use
gen_random_bytes()
instead of
random()
for true randomness, to reduce the chance that they can be guessed.

Thanks!




I know there's
gen_random_uuid()
for UUIDs, but I don't want to use them in this case. I'm looking for something that gives me IDs similar to what Stripe (or others) use, that look like:
"id": "ch_19iRv22eZvKYlo2CAxkjuHxZ"
that are as short as possible while still containing only alphanumeric characters.

This requirement is also why
encode(gen_random_bytes(), 'hex')
isn't quite right for this case, since it reduces the character set and thus forces me to increase the length of the strings to avoid collisions.

I'm currently doing this in the application layer, but I'm looking to move it into the database layer to reduce interdependencies. Here's what the Node.js code for doing it in the application layer might look like:

var crypto = require('crypto');
var set = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';

function generate(length) {
var bytes = crypto.randomBytes(length);
var chars = [];

for (var i = 0; i < bytes.length; i++) {
chars.push(set[bytes[i] % set.length]);
}

return chars.join('');
}

Answer Source

Figured this out, here's a function that does it:

CREATE OR REPLACE FUNCTION generate_uid(size INT) RETURNS TEXT AS $$
DECLARE
  characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  bytes BYTEA := gen_random_bytes(size);
  l INT := length(characters);
  i INT := 0;
  output TEXT := '';
BEGIN
  WHILE i < size LOOP
    output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
    i := i + 1;
  END LOOP;
  RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;

And then to run it simply do:

generate_uid(10)
-- '3Rls4DjWxJ'

Warning

When doing this you need to be sure that the length of the IDs you are creating is sufficient to avoid collisions over time as the number of objects you've created grows, which can be counter-intuitive because of the Birthday Paradox. So you will likely want a length greater (or much greater) than 10 for any reasonably commonly created object, I just used 10 as a simple example.


Usage

With the function defined, you can use it in a table definition, like so:

CREATE TABLE collections (
  id TEXT PRIMARY KEY DEFAULT generate_uid(10),
  name TEXT NOT NULL,
  ...
);

And then when inserting data, like so:

INSERT INTO collections (name) VALUES ('One');
INSERT INTO collections (name) VALUES ('Two');
INSERT INTO collections (name) VALUES ('Three');
SELECT * FROM collections;

It will automatically generate the id values:

    id     |  name  | ...
-----------+--------+-----
owmCAx552Q | ian    |
ZIofD6l3X9 | victor |

Usage with a Prefix

Or maybe you want to add a prefix for convenience when looking at a single ID in the logs or in your debugger (similar to how Stripe does it), like so:

CREATE TABLE collections (
  id TEXT PRIMARY KEY DEFAULT ('col_' || generate_uid(10)),
  name TEXT NOT NULL,
  ...
);

INSERT INTO collections (name) VALUES ('One');
INSERT INTO collections (name) VALUES ('Two');
INSERT INTO collections (name) VALUES ('Three');
SELECT * FROM collections;

      id       |  name  | ...
---------------+--------+-----
col_wABNZRD5Zk | ian    |
col_ISzGcTVj8f | victor |
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download