nacho nacho - 1 year ago 33
Python Question

Python look-behind regex "fixed-width pattern" error while looking for consecutive repeated words

I have a text with words separated by

.
, with instances of 2 and 3 consecutive repeated words:

My.name.name.is.Inigo.Montoya.You.killed.my.father.father.father.Prepare.to.die-


I need to match them independently with regex, excluding the duplicates from the triplicates.

Since there are max. 3 consecutive repeated words, this

r'\b(\w+)\.+\1\.+\1\b'


successfully catches
father.father.father


However, in order to catch 2 consecutive repeated words, I need to make sure the next and previous words aren't the same. I can do a negative look-ahead

r'\b(\w+)\.+\1(?!\.+\1)\b'


but my attempts at the negative look-behind

r'(?<!(\w)\.)\b\1\.+\1\b(?!\.\1)'


either return a fixed-width issue (when I keep the
+
) or some other issue.

How should I correct the negative look-behind?

Answer Source

Maybe regexes are not needed at all.

Using itertools.groupby does the job. It's designed to group equal occurrences of consecutive items.

  • group by words (after splitting according to dots)
  • convert to list and issue a tuple value,count only if length > 1

like this:

import itertools

s = "My.name.name.is.Inigo.Montoya.You.killed.my.father.father.father.Prepare.to.die"

matches = [(l[0],len(l)) for l in (list(v) for k,v in itertools.groupby(s.split("."))) if len(l)>1]

result:

[('name', 2), ('father', 3)]

So basically we can do whatever we want with this list of tuples (filtering it on the number of occurrences for instance)

Bonus (as I misread the question at first, so I'm leaving it in): to remove the duplicates from the sentence - group by words (after splitting according to dots) like above - take only key (value) of the values returned in a list comp (we don't need the values since we don't count) - join back with dot

In one line (still using itertools):

new_s = ".".join([k for k,_ in itertools.groupby(s.split("."))])

result:

My.name.is.Inigo.Montoya.You.killed.my.father.Prepare.to.die
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download