K. Mao K. Mao - 1 month ago 12
Python Question

Efficient way to find the index of repeated sequence in a list?

I have a large list of numbers in python, and I want to write a function that finds sections of the list where the same number is repeated more than n times. For example, if n is 3 then my function should return the following results for the following examples:


When applied to example = [1,2,1,1,1,1,2,3] the function should return [(2,6)], because example[2:6] is a sequence containing all the same value.

When applied to example = [0,0,0,7,3,2,2,2,2,1] the function should return [(0,3), (5,9)] because both example[0:3] and example[5:9] contain repeated sequences of the same value.

When applied to example = [1,2,1,2,1,2,1,2,1,2] the function should return [] because there is no sequence of three or more elements that are all the same number.


I know I could write a bunch of loops to get what I want, but that seems kind of inefficient, and I was wondering if there was an easier option to obtain what I wanted.

Answer

Use itertools.groupby and enumerate:

>>> from itertools import groupby
>>> n = 3
>>> x = [1,2,1,1,1,1,2,3] 
>>> grouped = (list(g) for _,g in groupby(enumerate(x), lambda t:t[1]))
>>> [(g[0][0], g[-1][0] + 1) for g in grouped if len(g) >= n]
[(2, 6)]
>>> x = [0,0,0,7,3,2,2,2,2,1]
>>> grouped = (list(g) for _,g in groupby(enumerate(x), lambda t:t[1]))
>>> [(g[0][0], g[-1][0] + 1) for g in grouped if len(g) >= n]
[(0, 3), (5, 9)]

To understand groupby: just realize that each iteration returns the value of the key, which is used to group the elements of the iterable, along with a new lazy-iterable that will iterate over the group.

>>> list(groupby(enumerate(x), lambda t:t[1]))
[(0, <itertools._grouper object at 0x7fc90a707bd0>), (7, <itertools._grouper object at 0x7fc90a707ad0>), (3, <itertools._grouper object at 0x7fc90a707950>), (2, <itertools._grouper object at 0x7fc90a707c10>), (1, <itertools._grouper object at 0x7fc90a707c50>)]
Comments