view raw
user963386 user963386 - 8 months ago 34
Python Question

How are regex quantifiers applied?

I have the following regex:

res = re.finditer(r'(?:\w+[ \t,]+){0,4}my car',txt,re.IGNORECASE|re.MULTILINE)
for item in res:

When I use this regex with the following string:

"my house is painted white, my car is red.
A horse is galloping very fast in the road, I drive my car slowly."

I am getting the following results:

  • house is painted white, my car

  • the road, I drive my car

My question is about the quantifier
that should apply to the whole group. The group collects words with the expression
and some separation symbols with the [ ]. Does the the quantifier apply only to the "words" defined by
? In the results I am getting 4 words plus space and comma. It's unclear to me.


So, here's what's happening. You're using ?: to make a non capture group, which collects 1 or more "words", followed by a [ \t,] (a space, tab char, or comma), match one or more of the preceeding. {0,4} matches between 0-4 of the non-capturing group. So it looks at the word "my car" and captures the 4 words before it, since all 4 of them match the \w+ and the , and space get eaten by the character set you specified.

Broken apart more succinctly

(?: -- Non capturing group
\w+ Grab all words
[ \t,]+ -- Grab all spaces, comma, or tab characters
) -- End capture group
{0,4} -- Match the previous capture group 0-4 times
my car -- Based off where you find the words "my car"

As a result this will match 0-4 words / spaces / commas / tabs before the appearance of "my car"

This is working as written