maestromusica maestromusica - 24 days ago 6
Python Question

Regex for a third-person verb

I'm trying to create a regex that matches a third person form of a verb created using the following rule:


If the verb ends in e not preceded by i,o,s,x,z,ch,sh, add s.


So I'm looking for a regex matching a word consisting of some letters, then not i,o,s,x,z,ch,sh, and then "es". I tried this:

\b\w*[^iosxz(sh)(ch)]es\b


According to regex101 it matches "likes", "hates" etc. However, it does not match "bathes", why doesn't it?

Answer

You may use

\b(?=\w*(?<![iosxz])(?<![cs]h)es\b)\w*

See the regex demo

Since Python re does not support variable length alternatives in a lookbehind, you need to split the conditions into two lookbehinds here.

Pattern details:

  • \b - a leading word boundary
  • (?=\w*(?<![iosxz])(?<![cs]h)es\b) - a positive lookahead requiring a sequence of:
    • \w* - 0+ word chars
    • (?<![iosxz]) - there must not be i, o, s, x, z chars right before the current location and...
    • (?<![cs]h) - no ch or sh right before the current location...
    • es - followed with es...
    • \b - at the end of the word
  • \w* - zero or more (maybe + is better here to match 1 or more) word chars.

See Python demo:

import re
r = re.compile(r'\b(?=\w*(?<![iosxz])(?<![cs]h)es\b)\w*')
s = 'it matches "likes", "hates" etc. However, it does not match "bathes", why doesn\'t it?'
print(re.findall(r, s))