Klaus Klaus - 1 month ago 6
Python Question

Regex for time ranges starting too far

I am trying to find time ranges of the form

12:30 Test
12:30-12:50 Test


with the simple regex
((\d+):(\d+)-?)+ (.*)
. It works fine for the first example, but for the second example, the match only begins at
12:50
, and doesn't catch the first time range.

Do you see why ?

Here is a Regex101 example and a minimal example in Python:

import re
print(re.search("^((\d+)(?::|h)(\d+)-?)+ (\w.*)", "12:30-12:50 Test").groups())

Answer

You cannot access repeated captures with Python re, you need to explicitly unwrap the quantified group and make the second part optional:

(\d+):(\d+)(?:-(\d+):(\d+))? (.*)
           ^^^^^^^^^^^^^^^^^

See the regex demo

Python demo:

import re
rx = r"(\d+):(\d+)(?:-(\d+):(\d+))? (.*)"
strs = ["12:30 Test", "12:30-12:50 Test"]
for str in strs:
    m = re.search(rx, str)
    if m:
        print(m.groups())

Output:

('12', '30', None, None, 'Test')
('12', '30', '12', '50', 'Test')

With PyPi regex, you can access all the captures, see an example with your regex:

>>> import regex
>>> strs = ["12:30 Test", "12:30-12:50 Test"]
>>> for str in strs:
    m = regex.search(r'((\d+):(\d+)-?)+ (.*)', str)
    if m:
        print(m.captures(1))
        print(m.captures(2))
        print(m.captures(3))


['12:30']
['12']
['30']
['12:30-', '12:50']
['12', '12']
['30', '50']
>>> 
Comments