Simplicity Simplicity - 2 years ago 61
Python Question

Regular expression - range of the match

I have the following regular expression:

re.findall(r'(\b[A-Za-z][a-z]{3,10}\b)', string_var)

I expected that this regular expression will return matches with the length ranging from
. It however returns matches for words ranging in length from

Do we thus read the above regular expression as matching those words which start with an upper case or lower case letter, followed by letters ranging in length from
? In other words, having the first letter as the extra letter which extended the range?


Answer Source


Your regex is


Now, the grouping parens don't affect the match, so we can ignore them. And the \b is a "zero-width" matching operator - it matches a transition from one character class to another - so it doesn't actually correspond to any characters. We can ignore them. That leaves this:


This is two character classes, with a repetition specifier suffix on the second:

  1. [A-Za-z] - matches one character, upper or lower case Latin alphabetic.

  2. [a-z]{3,10} - matches at least 3, at most 10 characters, lowercase a-z

So in total, you are matching 1 + [3,10] character. Your minimal match will be 4 characters, and your maximal match will be 11.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download