alvas alvas - 1 month ago 7
Python Question

What is the 2nd argument for the iter function in Python?

Let's consider a file:

$ echo -e """This is a foo bar sentence .\nAnd this is the first txtfile in the corpus .""" > test.txt
$ cat test.txt
This is a foo bar sentence .
And this is the first txtfile in the corpus .


And when I want to read the file by character, I can do http://stackoverflow.com/a/25071590/610569:

>>> fin = open('test.txt')
>>> while fin.read(1):
... fin.seek(-1,1)
... print fin.read(1),
...
T h i s i s a f o o b a r s e n t e n c e .
A n d t h i s i s t h e f i r s t t x t f i l e i n t h e c o r p u s .


But using while loop might look a little unpythonic esp. when i use
fin.read(1)
to check for EOF and then backtrack in-order to read the current byte. And so I can do something like this How to read a single character at a time from a file in Python?:

>>> import functools
>>> fin = open('test.txt')
>>> fin_1byte = iter(functools.partial(fin.read, 1), '')
>>> for c in fin_1byte:
... print c,
...
T h i s i s a f o o b a r s e n t e n c e .
A n d t h i s i s t h e f i r s t t x t f i l e i n t h e c o r p u s .


But when I tried it without the second argument, it throws a
TypeError
:

>>> fin = open('test.txt')
>>> fin_1byte = functools.partial(fin.read, 1)
>>> for c in iter(fin_1byte):
... print c,
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'functools.partial' object is not iterable


What is the 2nd argument in
iter
?
The docs don't say much either: https://docs.python.org/2/library/functions.html#iter and https://docs.python.org/3.6/library/functions.html#iter




As per the doc:


Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method), or it must support the sequence protocol (the getitem() method with integer arguments starting at 0). If it does not support either of those protocols, TypeError is raised. If the second argument, sentinel, is given, then object must be a callable object. The iterator created in this case will call object with no arguments for each call to its next() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.


I guess the docs require some "decrypting":


  • Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method)



Does that mean it needs to come from
collections
? Or is that as long as the object has an
__iter__()
, that's okay?


  • , or it must support the sequence protocol (the getitem() method with integer arguments starting at 0)



That's rather cryptic. So does that means it tries to see whether the sequence is indexed and hence query-able and that the index must starts from 0? Does it also mean that the indices need to be sequential, i.e. 0, 1, 2, 3, ... and not something like 0, 2, 8, 13, ...?


  • If it does not support either of those protocols, TypeError is raised.



Yes, this part, I do understand =)


  • If the second argument, sentinel, is given, then object must be a callable object.



Okay, now this gets a little sci-fi. Is it just a terminology in Python to call something a
sentinel
? What does
sentinel
mean Pythonically? And "callable object" like it's a function and not type object?


  • The iterator created in this case will call object with no arguments for each call to its next() method;



This part i don't really get it, maybe an example would help.


  • if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.



Okay, so
sentinel
here refers to some breaking criteria?

Can someone help to decipher/clarify the meaning of the above points about
iter
?

Answer

With one argument, iter must be given an object that has the __iter__ special method, or __getitem__ special method. If neither of them exists, iter will raise an error

>>> iter(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable

There are 2 protocols for iteration. The old protocol relies on calling __getitem__ for successive integers from 0 until one that raises IndexError. The new protocol relies on the iterator that is returned from __iter__.

In Python 2, str doesn't even have the __iter__ special method:

Python 2.7.12+ (default, Sep 17 2016, 12:08:02) 
[GCC 6.2.0 20160914] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 'abc'.__iter__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute '__iter__'

yet it is still iterable:

>>> iter('abc')
<iterator object at 0x7fcee9e89390>

To make your custom class iterable, you need to have either __iter__ or __getitem__ that raises IndexError for non-existent items:

class Foo:
    def __iter__(self):
        return iter(range(5))

class Bar:
    def __getitem__(self, i):
        if i >= 5:
            raise IndexError
        return i

Using these:

>>> list(iter(Foo()))
[0, 1, 2, 3, 4]
>>> list(iter(Bar()))
[0, 1, 2, 3, 4]

Usually explicit iter is not needed as for loops and methods that expect iterables will create an iterator implicitly:

>>> list(Foo())
[0, 1, 2, 3, 4]
>>> for i in Bar():
0
1
2
3
4

With the 2 argument form, the first argument must be a function or an object that implements __call__. The first argument is called without arguments; the return values are yielded from the iterator. The iteration stops when the value returned from the function call on that iteration equals the given sentinel value, as if by:

value = func()
if value == sentinel:
    return
else:
    yield value

For example, to get values on a die before we throw 6,

>>> import random
>>> throw = lambda: random.randint(1, 6)
>>> list(iter(throw, 6))
[3, 2, 4, 5, 5]
>>> list(iter(throw, 6))
[1, 3, 1, 3, 5, 1, 4]

(i.e. throw is called as throw() and if the returned value didn't equal to 6, it is yielded).

This can be used to also make an endless iterator from repeated function calls:

>>> dice = iter(throw, 7)

Since the value returned can never be equal to 7, the iterator runs forever. A common idiom is to use an anonymous object to make sure that the comparison wouldn't succeed for any conceivable value

>>> dice = iter(throw, object())

Because

>>> object() != object()
True

Note, that the word sentinel is commonly used for a value that is used as an end marker in some data, and that doesn't occur naturally within the data, as in this Java answer.