Asher11 Asher11 - 2 months ago 12
Python Question

applying a mask on a nested numpy array - numpy - python

a bit embarassing to ask since the heavy documentation on Numpy but I was stuck doing this simple task, that is getting all the records for which a mask is true in a nested numpy representation (equivalent to the

dataframe.loc[cond]
in
pandas
):

import numpy as np
a1 = np.array([1,2,3])
a2 = np.array(['a','b','c'])
a3 = np.array(['luca','paolo','francesco'])
a4 = np.array([True, False,False], dtype='bool')

combination = np.array([a1,a2,a3,a4])
print(combination)

# slice for a4 == True
combination[combination[3] == 'True']


but the result is not what I want.

in fact from
combination
:

[['1' '2' '3']
['a' 'b' 'c']
['luca' 'paolo' 'francesco']
['True' 'False' 'False']]


it yields with
combination[combination[3] == 'True']
:

array([['1', '2', '3']],
dtype='<U11')


when in reality I want:

[['1']
['a' ]
['luca']
['True' ]]


any ideas on what I am doing wrong?

P.S.: no i can't do it in pandas because pandas has my RAM exploding when converting this to a
pandas.Dataframe

Answer

I believe you're simply missing the indexes of the other dimension:

combination[combination[3] == 'True']

should be

combination[:, combination[3] == 'True']

Note the colon.

This yields a new ndarray indexed over all of the first dimension and only 0 in the second.

Comments