Bob Bob - 6 months ago 38
Python Question

Error when indexing numpy array

I'm running the following code to sort the rows of a matrix and then pick the corresponding elements from another vector:

import pandas as pd
import numpy as np
# compute ids
coeff = np.dot(matrix1, np.transpose(matrix2))
print coeff.shape, ids.shape
indices = coeff.argsort()[:, ::-1]
print indices.shape
coeff_idx = ids[indices]


But I get the error when the program gets to the last line:

(11396, 45582) (11396,)
(11396, 45582)

...
File "pandas/hashtable.pyx", line 359, in pandas.hashtable.Int64HashTable.lookup (pandas/hashtable.c:7427)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Answer

NumPy arrays allow integer array indexing, but Pandas Series do not:

In [168]: arr = np.arange(5)

In [169]: ser = pd.Series(arr)

In [170]: indices = np.array([[0,1],[4,3],[2,2]])

In [171]: arr[indices]
Out[171]: 
array([[0, 1],
       [4, 3],
       [2, 2]])

In [172]: ser[indices]
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Therefore, change ids from a Pandas Series to a NumPy array before trying to index it with indices:

ids = ids.values
coeff_idx = ids[indices]