Jean-François Fabre Jean-François Fabre - 1 month ago 16
Python Question

avoid converting iterator to list whenever possible

Suppose I have a function taking one parameter, an iterable, as input, and I want to iterate more than once on the iterable.

If I write it like this:

def a_function(an_iterable):
for x in an_iterable:
print(x)
for x in an_iterable:
print(x)


the second loop may be executed or not.


  • it is executed if the iterable is a
    list
    ,
    set
    ,
    dict
    , which are not iterator/generator functions, or
    range
    which rearms itself

  • not for a custom generator function or a file object (obtained with
    f=open("file")
    ). (the reuse of a file iterator is the subject of many questions here at SO)



Of course, I could do this to avoid creating an unnecessary
list
if not needed:

def a_function(an_iterable):
if any(lambda x : type(an_iterable==x) for x in (range,list,set,dict))):
# use as-is
pass
else:
an_iterable = list(an_iterable)

for x in an_iterable:
print(x)
for x in an_iterable:
print(x)


that would work for a lot of common cases, but not the general case.

Is there a clean way to detect if I can iterate many times on my iterable object?

Answer

You can use the collections.abc.Sequence class to see if the iterable is actually a sequence:

>>> from collections.abc import Sequence
>>> isinstance([1,2,3], Sequence)
True
>>> isinstance((1,2,3), Sequence)
True
>>> isinstance(range(10), Sequence)
True
>>> isinstance(iter((1,2,3)), Sequence)
False

This wont work for sets:

>>> isinstance({1,2,3}, Sequence)
False

If you want to include sets and mappings use the collections.abs.Set and collections.abc.Mapping:

>>> isinstance({1,2,3}, (Sequence, Set, Mapping))
True

You may want to create an helper function that converts an iterable to a sequence if needed:

def sequencify(iterable):
    if isinstance(iterable, (Sequence, Set, Mapping)):
        return iterable
    return list(iterable)

And now you can just do:

def a_function(iterable):
    iterable = sequencify(iterable)

    for x in iterable:
        print(x)
    for x in iterable:
        print(x)

A simpler alternative is to check that iterable argument does not have a __next__ method:

>>> hasattr([1,2,3], '__next__')
False
>>> hasattr(iter([1,2,3]), '__next__')
True

This works because well-implemented containers are only iterables and not iterator themselves, so they only have an __iter__ method that returns an iterator which has the __next__ method that advances the iteration.

This would lead to:

def sequencify(iterable):
    if not hasattr(iterable, '__next__'):
        return iterable
    return list(iterable)

The final alternative is the simplest: document the argument as a sequence and not an iterable and let the user be responsible for providing the correct type.