I have a text with words separated by
Maybe regexes are not needed at all.
itertools.groupby does the job. It's designed to group equal occurrences of consecutive items.
tuplevalue,count only if length > 1
import itertools s = "My.name.name.is.Inigo.Montoya.You.killed.my.father.father.father.Prepare.to.die" matches = [(l,len(l)) for l in (list(v) for k,v in itertools.groupby(s.split("."))) if len(l)>1]
[('name', 2), ('father', 3)]
So basically we can do whatever we want with this list of tuples (filtering it on the number of occurrences for instance)
Bonus (as I misread the question at first, so I'm leaving it in): to remove the duplicates from the sentence - group by words (after splitting according to dots) like above - take only key (value) of the values returned in a list comp (we don't need the values since we don't count) - join back with dot
In one line (still using
new_s = ".".join([k for k,_ in itertools.groupby(s.split("."))])