Asher11 - 1 year ago 50

Python Question

a bit embarassing to ask since the heavy documentation on Numpy but I was stuck doing this simple task, that is getting all the records for which a mask is true in a nested numpy representation (equivalent to the

`dataframe.loc[cond]`

`pandas`

`import numpy as np`

a1 = np.array([1,2,3])

a2 = np.array(['a','b','c'])

a3 = np.array(['luca','paolo','francesco'])

a4 = np.array([True, False,False], dtype='bool')

combination = np.array([a1,a2,a3,a4])

print(combination)

# slice for a4 == True

combination[combination[3] == 'True']

but the result is not what I want.

in fact from

`combination`

`[['1' '2' '3']`

['a' 'b' 'c']

['luca' 'paolo' 'francesco']

['True' 'False' 'False']]

it yields with

`combination[combination[3] == 'True']`

`array([['1', '2', '3']],`

dtype='<U11')

when in reality I want:

`[['1']`

['a' ]

['luca']

['True' ]]

any ideas on what I am doing wrong?

P.S.: no i can't do it in pandas because pandas has my RAM exploding when converting this to a

`pandas.Dataframe`

Answer Source

I believe you're simply missing the indexes of the other dimension:

```
combination[combination[3] == 'True']
```

should be

```
combination[:, combination[3] == 'True']
```

Note the colon.

This yields a new ndarray indexed over all of the first dimension and only 0 in the second.