Octavius - 1 year ago 58

Python Question

I have an iterator

`iterator`

`indices`

`indices = sorted(indices)`

deltas = [indices[0]] + [indices[i+1] - indices[i] for i in range(len(indices) - 1)]

output = []

for delta in deltas:

for i in range(delta):

datum = next(iterator)

output.append(datum)

Are those two layers of loop necessary? Am I missing a trick with

`itertools`

Answer Source

You definitely don't need the double loop as you can do this with a single loop and without creating deltas but the check code becomes more complicated:

```
it = iter(sorted(indices))
index = next(it)
for i, datum in enumerate(iterator):
if i != index:
continue
output.append(datum)
try:
index = next(it)
except StopIteration:
break
```

You can also do this in a list comprehension for very low number of indices as you incur overhead for the check (but avoid the `sort`

):

```
[datum for i, datum in enumerate(x) if i in indices]
```

You can reduce the cost of the check by converting `indices`

to a `set`

. I would be interested to see performance of `sort`

over a `set`

construction (set lookup is O(1)):

```
indices = set(indices)
[datum for i, datum in enumerate(x) if i in indices]
```

**Update**: First and third options are roughly equivalent in timing at just over 900 ms (slight edge to first) to choose 1000 random indices out of 10,000,000 items. The OP's code ran in about 1.2 s.