PascalvKooten PascalvKooten - 1 year ago 42
Python Question

Repeating captures in Python strange result

I would like to repeat for natural numbers to occur and catch all of them.

import re
r = "the ((sixty|six)[ -]+)+items"
s = "the sixty six items"
re.findall(r, s)
# [('six ', 'six')]

It matches 'six' 2 times, while it can be observed that it could have never matched on "six six"; instead it had to have matched on "sixty six", but the capture returns ('six', 'six').

What is happening here and how can I return ('sixty', 'six')?

Answer Source just finds the first thing that matches the pattern, it doesn't look for further matches once it's found one. You are getting ('six ', 'six') because you have one capture group nested inside another; the 'six ' matches the outer group, and the 'six' (without a trailing space) matches the inner group.

You can do what you want using two un-nested capture groups inside some non-capture groups, which use the (?:...) syntax.

import re

r = "the (?:(?:(sixty)|(six))[ -]+)+items"
s = "the sixty six items"
m =, s)
if m:


('sixty', 'six')

This returns a tuple of two items because we have two capture groups in the pattern.

Here's a longer demo.

import re

pat = re.compile("the (?:(?:(sixty)|(six))[ -]+)+items")

data = (
    "the items",
    "the six items",
    "the six six items",
    "the sixty items",
    "the six sixty items",
    "the sixty six items",
    "the sixty-six items",
    "the six sixty sixty items",

for s in data:
    m =
    print('{!r} -> {}'.format(s, m.groups() if m else None))  


'the items' -> None
'the six items' -> (None, 'six')
'the six six items' -> (None, 'six')
'the sixty items' -> ('sixty', None)
'the six sixty items' -> ('sixty', 'six')
'the sixty six items' -> ('sixty', 'six')
'the sixty-six items' -> ('sixty', 'six')
'the six sixty sixty items' -> ('sixty', 'six')