PascalvKooten - 1 year ago 64
Python Question

# Repeating captures in Python strange result

I would like to repeat for natural numbers to occur and catch all of them.

``````import re
r = "the ((sixty|six)[ -]+)+items"
s = "the sixty six items"
re.findall(r, s)
# [('six ', 'six')]
``````

It matches 'six' 2 times, while it can be observed that it could have never matched on "six six"; instead it had to have matched on "sixty six", but the capture returns ('six', 'six').

What is happening here and how can I return ('sixty', 'six')?

`re.search` just finds the first thing that matches the pattern, it doesn't look for further matches once it's found one. You are getting `('six ', 'six')` because you have one capture group nested inside another; the `'six '` matches the outer group, and the `'six'` (without a trailing space) matches the inner group.

You can do what you want using two un-nested capture groups inside some non-capture groups, which use the `(?:...)` syntax.

``````import re

r = "the (?:(?:(sixty)|(six))[ -]+)+items"
s = "the sixty six items"
m = re.search(r, s)
if m:
print(m.groups())
``````

output

``````('sixty', 'six')
``````

This returns a tuple of two items because we have two capture groups in the pattern.

Here's a longer demo.

``````import re

pat = re.compile("the (?:(?:(sixty)|(six))[ -]+)+items")

data = (
"the items",
"the six items",
"the six six items",
"the sixty items",
"the six sixty items",
"the sixty six items",
"the sixty-six items",
"the six sixty sixty items",
)

for s in data:
m = pat.search(s)
print('{!r} -> {}'.format(s, m.groups() if m else None))
``````

output

``````'the items' -> None
'the six items' -> (None, 'six')
'the six six items' -> (None, 'six')
'the sixty items' -> ('sixty', None)
'the six sixty items' -> ('sixty', 'six')
'the sixty six items' -> ('sixty', 'six')
'the sixty-six items' -> ('sixty', 'six')
'the six sixty sixty items' -> ('sixty', 'six')
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download