Chris_Rands Chris_Rands - 24 days ago 14
Python Question

Efficiently check if an element occurs at least n times in a list

How to best write a Python function (

check_list
) to efficiently test if an element (
x
) occurs at least
n
times in a list (
l
)?

My first thought was:

def check_list(l, x, n):
return l.count(x) >= n


But this doesn't short-circuit once
x
has been found
n
times and is always O(n).

A simple approach that does short-circuit would be:

def check_list(l, x, n):
count = 0
for item in l:
if item == x:
count += 1
if count == n:
return True
return False


I also have a more compact short-circuiting solution with a generator:

def check_list(l, x, n):
gen = (1 for item in l if item == x)
return all(next(gen,0) for i in range(n))


Are there other good solutions? What is the best efficient approach?

Thank you

Answer

Instead of incurring extra overhead with the setup of a range object and using all which has to test the truthiness of each item, you could use itertools.islice to advance the generator n steps ahead, and then return the next item in the slice if the slice exists or a default False if not:

from itertools import islice

def check_list(lst, x, n):
    gen = (True for i in lst if i==x)
    return next(islice(gen, n-1, None), False)

Note that like list.count, itertools.islice also runs at C speed. And this has the extra advantage of handling iterables that are not lists.


Some timing:

In [1]: from itertools import islice

In [2]: from random import randrange

In [3]: lst = [randrange(1,10) for i in range(100000)]

In [5]: %%timeit # using list.index
   ....: check_list(lst, 5, 1000)
   ....:
1000 loops, best of 3: 736 µs per loop

In [7]: %%timeit # islice
   ....: check_list(lst, 5, 1000)
   ....:
1000 loops, best of 3: 662 µs per loop

In [9]: %%timeit # using list.index
   ....: check_list(lst, 5, 10000)
   ....:
100 loops, best of 3: 7.6 ms per loop

In [11]: %%timeit # islice
   ....: check_list(lst, 5, 10000)
   ....:
100 loops, best of 3: 6.7 ms per loop
Comments