SUNDONG SUNDONG - 2 months ago 12
Python Question

Get pandas subseries by values when each value is a ndarray

I want to make a subseries by values when the Series consists of ndarrays.

This one works.

sa = pd.Series([1,2,35,2],index=list('abcd'))
sa[sa==2]


Results

b 2
d 2
dtype: int64


Why below codes does not work? What should I change?. It gives a ValueError: Lengths must match to compare

sa2 = pd.Series([np.array(['out']), np.array(['2f-right', '2f']), np.array(['out', '2f']), np.array(['out'])], index=list('abcd'))
ar = np.array(['out'])
sa2[sa2 == ar]

Answer

The comparison operator doesn't understand how to compare for equality with np arrays here so you can use apply with a lambda:

In [211]:
sa2[sa2.apply(lambda x: (x == ar).all())]

Out[211]:
a    [out]
d    [out]
dtype: object

So here we compare against the array and use all to generate a boolean mask