dimid dimid - 3 years ago 142
SQL Question

Find all records with Hebrew names

I have a postgresql database with a users table, where each user has a name (in unicode). I'd like to find all users whose name contains at least one Hebrew character. I've thought using regex, e.g.

select * from users
where name ~ '[א-ת]';


Is there a more efficient approach than the regex? I have a B-tree index on the names column.

Update

Using the different indices with the
pg_trgm
module as suggested by @FuzzyTree

B-tree GIST GIN
user 0.04 0.04 0.03
sys 0.02 0.04 0.01
total 0.06 0.08 0.04


Regarding disk size, the GIN index is 0.2x of the GIST one, and 0.8x of the B-tree. So, we have a winner here, at least for my use case. YMMV (e.g. I haven't benchmarked index creation and update).

Answer Source

One option is to create a boolean column i.e. is_hebrew_name that you can update once using your regex and create a regular index on.

If you don't want to add another column and you're running v9.3 or higher, then consider using the pg_trgm module to create a GIN or GIST index on name

CREATE EXTENSION pg_trgm;
CREATE INDEX trgm_idx ON users USING GIST (name gist_trgm_ops);

The pg_trgm module provides GiST and GIN index operator classes that allow you to create an index over a text column for the purpose of very fast similarity searches. These index types support the above-described similarity operators, and additionally support trigram-based index searches for LIKE, ILIKE, ~ and ~* queries.

The index search works by extracting trigrams from the regular expression and then looking these up in the index. The more trigrams that can be extracted from the regular expression, the more effective the index search is. Unlike B-tree based searches, the search string need not be left-anchored.

For both LIKE and regular-expression searches, keep in mind that a pattern with no extractable trigrams will degenerate to a full-index scan.

The choice between GiST and GIN indexing depends on the relative performance characteristics of GiST and GIN, which are discussed elsewhere.

For more info see https://www.postgresql.org/docs/9.6/static/pgtrgm.html

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download