Octavius Octavius - 2 months ago 9
Python Question

Index into a Python iterator

I have an iterator

iterator
and a list of indices
indices
(repeats possible) and I want to extract just those elements from my iterator. At the moment I'm doing

indices = sorted(indices)
deltas = [indices[0]] + [indices[i+1] - indices[i] for i in range(len(indices) - 1)]
output = []
for delta in deltas:
for i in range(delta):
datum = next(iterator)
output.append(datum)


Are those two layers of loop necessary? Am I missing a trick with
itertools
?

Answer

You definitely don't need the double loop as you can do this with a single loop and without creating deltas but the check code becomes more complicated:

it = iter(sorted(indices))
index = next(it)
for i, datum in enumerate(iterator):
    if i != index:
        continue
    output.append(datum)
    try:
        index = next(it)
    except StopIteration:
        break

You can also do this in a list comprehension for very low number of indices as you incur overhead for the check (but avoid the sort):

[datum for i, datum in enumerate(x) if i in indices]

You can reduce the cost of the check by converting indices to a set. I would be interested to see performance of sort over a set construction (set lookup is O(1)):

indices = set(indices)
[datum for i, datum in enumerate(x) if i in indices]

Update: First and third options are roughly equivalent in timing at just over 900 ms (slight edge to first) to choose 1000 random indices out of 10,000,000 items. The OP's code ran in about 1.2 s.