acdr - 8 months ago 50

Python Question

I have a very large array, consisting of integers between 0 and N, where each value occurs at least once.

I'd like to know, for each value *k*, all the indices in my array where the array's value equals *k*.

For example:

`arr = np.array([0,1,2,3,2,1,0])`

desired_output = {

0: np.array([0,6]),

1: np.array([1,5]),

2: np.array([2,4]),

3: np.array([3]),

}

Right now I am accomplishing this with a loop over

`range(N+1)`

`np.where`

`indices = {}`

for value in range(max(arr)+1):

indices[value] = np.where(arr == value)[0]

This loop is by far the slowest part of my code. (Both the

`arr==value`

`np.where`

I also tried playing around with

`np.unique(arr, return_index=True)`

Answer

A pythonic way is using `collections.defaultdict()`

:

```
>>> from collections import defaultdict
>>>
>>> d = defaultdict(list)
>>>
>>> for i, j in enumerate(arr):
... d[j].append(i)
...
>>> d
defaultdict(<type 'list'>, {0: [0, 6], 1: [1, 5], 2: [2, 4], 3: [3]})
```

And here is a Numpythonic way using a dictionary comprehension and `numpy.where()`

:

```
>>> {i: np.where(arr == i)[0] for i in np.unique(arr)}
{0: array([0, 6]), 1: array([1, 5]), 2: array([2, 4]), 3: array([3])}
```

And here is a pure Numpythonic approach if you don't want to involve the dictionary:

```
>>> uniq = np.unique(arr)
>>> args, indices = np.where((np.tile(arr, len(uniq)).reshape(len(uniq), len(arr)) == np.vstack(uniq)))
>>> np.split(indices, np.where(np.diff(args))[0] + 1)
[array([0, 6]), array([1, 5]), array([2, 4]), array([3])]
```