Shawn Shawn - 1 year ago 125
SQL Question

Oracle regexp_like pattern with POSIX character class

I have username in this pattern


How can I use regexp_like for this?

SELECT username FROM all_users WHERE regexp_like(username, '^ref_2_34_[:alpha:]_dos$')

does not work. Nor can I use ESCAPE '\' with regexp_like. it would give a syntax error.

Answer Source

You need to put the POSIX character class [:alpha:] into a bracket expression (i.e. [...]) and apply a + quantifier to it:

regexp_like(username, '^ref_2_34_[[:alpha:]]+_dos$')

The + quantifier means there can be 1 or more letters between the last but one and last underscores.

If your string may have no user name at that location (it is empty), and you want to get those entries too, you would need to replace the plus with the * quantifier that matches zero or more occurrences of the quantified subpattern.

Since comments require some more clarifications, here is some bracket expression and POSIX character class reference.

Bracket expressions

Matches any single character in the list within the brackets. The following operators are allowed within the list, but other metacharacters included are treated as literals:

  • Range operator: -
  • POSIX character class: [: :]
  • POSIX collation element: [. .]
  • POSIX character equivalence class: [= =]

A dash (-) is a literal when it occurs first or last in the list, or as an ending range point in a range expression, as in [#--]. A right bracket (]) is treated as a literal if it occurs first in the list.

POSIX Character Class (can be a part of the *bracket expression):

[:class:] - Matches any character belonging to the specified POSIX character class. You can use this operator to search for characters with specific formatting such as uppercase characters, or you can search for special characters such as digits or punctuation characters. The full set of POSIX character classes is supported. ... The expression [[:upper:]]+ searches for one or more consecutive uppercase characters.

Bracket expressions can be considered a kind of a "container" construct for multiple atoms, that, as a whole regex unit, matches some class of characters you defined. If you need to match a <, or >, or letters, you may combine them into 1 bracket expression [<>[:alpha:]]. To match zero or more of <, > or letters, add a * quantifier after ]: [<>[:alpha:]]*.

Or, to imitate a trailing word boundary, one might use [^_[:alnum:]] that matches any character but a _, digits and letters ([:alnum:] matches alphanumerical symbols).