I have a postgresql database with a users table, where each user has a name (in unicode). I'd like to find all users whose name contains at least one Hebrew character. I've thought using regex, e.g.
select * from users
where name ~ '[א-ת]';
B-tree GIST GIN
user 0.04 0.04 0.03
sys 0.02 0.04 0.01
total 0.06 0.08 0.04
One option is to create a boolean column i.e.
is_hebrew_name that you can update once using your regex and create a regular index on.
If you don't want to add another column and you're running v9.3 or higher, then consider using the
pg_trgm module to create a
GIST index on
CREATE EXTENSION pg_trgm; CREATE INDEX trgm_idx ON users USING GIST (name gist_trgm_ops);
The pg_trgm module provides GiST and GIN index operator classes that allow you to create an index over a text column for the purpose of very fast similarity searches. These index types support the above-described similarity operators, and additionally support trigram-based index searches for LIKE, ILIKE, ~ and ~* queries.
The index search works by extracting trigrams from the regular expression and then looking these up in the index. The more trigrams that can be extracted from the regular expression, the more effective the index search is. Unlike B-tree based searches, the search string need not be left-anchored.
For both LIKE and regular-expression searches, keep in mind that a pattern with no extractable trigrams will degenerate to a full-index scan.
The choice between GiST and GIN indexing depends on the relative performance characteristics of GiST and GIN, which are discussed elsewhere.
For more info see https://www.postgresql.org/docs/9.6/static/pgtrgm.html