user89239213892389 user89239213892389 - 2 years ago 100
Python Question

Finding the repeating sub string in a big string

Take for example the following strings


I want to find the sub string that is repeating repetition for each.


How can I do this in python. Using regex is okay, I tried the following:

import re

testString = "0.714285714285714285714285714285714285714285"
print("(.+)\1", testString).group(1))

This gives me the (wrong) output:


It should be

How do I fix this? Is there way to improve my regex or is regex the wrong tool for this job? Maybe python has an awesome built in for this? Is there anyway to do use this with or without regex?

EDIT Before posting an answer check with the test case


It should return

Answer Source

You can give a try to this pattern:



or to obtain the substring as whole match (group 0):


What does exactly the pattern?

It returns, for a position in the string, the smallest repeated substring that spans the larger part of the string.

How does it work?

In a lookahead is described the largest repeated substring with a greedy quantifier (i.e. (\d+)), followed by its repetitions \1+, followed by the end of the string captured in group 2.

Then, once the lookahead closed, (\d+?)\3+ searches this time the smallest repeated substring with a non-greedy quantifier but with a condition: after the repetitions, the end of the string must be the same than the one captured in the lookahead.

This ensures that the substring in group 3 can't be sliced into a smaller repeated substring.


The searched substring is in the group 3.

If you use the pattern as it (i.e. non-anchored), the first repeated substring on the left is returned.

Obviously, if you only want a result that starts after the dot you need to anchor the pattern with it:

\.(?=(\d+)\1+(.*))(\d+?)\3+\2$ # immediately after the dot


\..*?(?=(\d+)\1+(.*))(\d+?)\3+\2$ # the first after the dot

If you want to research repeated substrings for each positions in the string (for example to find the largest whatever the starting position), you need to enclose all the second part in a lookahead too and to use re.findall:


(Then feel free to sort the result list, if you want to obtain the largest string whatever the starting position)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download