cᴏʟᴅsᴘᴇᴇᴅ - 30 days ago 9
Python Question

# Determine the shuffled indices of two lists/arrays

As a challenge, I've given myself this problem:

Given 2 lists, A, and B, where B is a shuffled version of A, the idea is to figure out the shuffled indices.

For example:

``````A = [10, 40, 30, 2]
B = [30, 2, 10, 40]

result = [2,   3,    0,      1]
A[2]  A[3]   A[0]  A[1]
||     ||     ||    ||
30      2     10    40
``````

Note that ties for identical elements can be resolved arbitrarily.

I've come up with a solution that involves the use of a dictionary to store indices. What other possible solutions does this problem have? A solution using a library also works. Numpy, pandas, anything is fine.

As an improvement over your current solution, you could use `collections.defaultdict` and avoid `dict.setdefault`:

``````from collections import defaultdict

A = [10, 40, 30, 2]
B = [30, 2, 10, 40]

idx = defaultdict(list)
for i, l in enumerate(A):
idx[l].append(i)

res = [idx[l].pop() for l in B]
print(res)
``````

Here are the timings for the two methods using the sample input given:

Script used for testing

``````from timeit import timeit

setup = """
from collections import defaultdict;
idx1 = defaultdict(list); idx2 = {}
A = [10, 40, 30, 2]
B = [30, 2, 10, 40]
"""

me = """
for i, l in enumerate(A):
idx1[l].append(i)
res = [idx1[l].pop() for l in B]
"""

coldspeed = """
for i, l in enumerate(A):
idx2.setdefault(l, []).append(i)
res = [idx2[l].pop() for l in B]
"""

print(timeit(setup=setup, stmt=me))
print(timeit(setup=setup, stmt=coldspeed))
``````

Results

``````original: 2.601998388010543
modified: 2.0607256239745766
``````

So it appears that using `defaultdict` actually yields a slight speed increase. This actually makes since though since `defaultdict` is implemented in C rather than Python. Not to mention that the attribute lookup of the original solution - `idx.setdefault1` - is costly.