Bob - 1 year ago 154

Python Question

I'm running the following code to sort the rows of a matrix and then pick the corresponding elements from another vector:

`import pandas as pd`

import numpy as np

# compute ids

coeff = np.dot(matrix1, np.transpose(matrix2))

print coeff.shape, ids.shape

indices = coeff.argsort()[:, ::-1]

print indices.shape

coeff_idx = ids[indices]

But I get the error when the program gets to the last line:

`(11396, 45582) (11396,)`

(11396, 45582)

...

File "pandas/hashtable.pyx", line 359, in pandas.hashtable.Int64HashTable.lookup (pandas/hashtable.c:7427)

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Answer Source

NumPy arrays allow integer array indexing, but Pandas Series do not:

```
In [168]: arr = np.arange(5)
In [169]: ser = pd.Series(arr)
In [170]: indices = np.array([[0,1],[4,3],[2,2]])
In [171]: arr[indices]
Out[171]:
array([[0, 1],
[4, 3],
[2, 2]])
In [172]: ser[indices]
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
```

Therefore, change `ids`

from a Pandas Series to a NumPy array before trying to index it with `indices`

:

```
ids = ids.values
coeff_idx = ids[indices]
```