Subhayan Bhattacharya Subhayan Bhattacharya - 3 years ago 130
Python Question

How to capture the sets of repeating characters in Python regex

import re
line = "..12345678910111213141516171820212223"
regex = re.compile(r'((?:[a-zA-Z0-9])\1+)')
print ("not coming here")
matches = re.findall(regex,line)
print (matches)


In the above code i am trying to capture the groups of repeating characters.

So for example i need the answers like :
111
222
etc.

But when i run the above code i get this error:

Traceback (most recent call last):
File "First.py", line 3, in <module>
regex = re.compile(r'((?:[a-zA-Z0-9])\1+)')
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\re.py", lin
e 224, in compile
return _compile(pattern, flags)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\re.py", lin
e 293, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_compile
.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 778, in _parse
p = _parse_sub(source, state)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 524, in _parse
code = _escape(source, this, state)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 415, in _escape
len(escape))
sre_constants.error: cannot refer to an open group at position 16


Someone please guide me where i am going wrong.

Jan Jan
Answer Source

You (probably) want

([a-zA-Z0-9])\1+

See a demo on regex101.com.


In Python:

import re
line = "..12345678910111213141516171820212223"
regex = re.compile(r'([a-zA-Z0-9])\1+')

matches = [match.group(0) for match in regex.finditer(line)]
print (matches)
# ['111', '222']
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download