snowleopard snowleopard - 5 months ago 10x
Python Question

Sorting values in numpy structured array based on field name value

I have the following structured array:

import numpy as np

x = np.rec.array([(22,2,200.,2000.), (44,2,400.,4000.), (55,5,500.,5000.), (33,3,400.,3000.)],
dtype={'names':['subcase','id', 'vonmises','maxprincipal'], 'formats':['i4','i4','f4','f4']})

I am trying to get the max vonmises for each id.

For example the max vonmises for id 2 would be 400. And i do what the corresponding subcase, and maxprincipal.

Here is what i have done so far:

print repr(x[['subcase','id','vonmises']][(x['id']==2) & (x['vonmises']==max(x['vonmises'][x['id']==2]))])

Here is the output:

array([(44, 2, 400.0)],
dtype=(numpy.record, [('subcase', '<i4'), ('id', '<i4'), ('vonmises', '<f4')]))

The issue i am having now is that i want this to work for all ids that are in the array, not just id=2.

i.e. want to get the following output:

array([(44, 2, 400.0),(55, 5, 500.0),(33, 3, 400.0)],
dtype=(numpy.record, [('subcase', '<i4'), ('id', '<i4'), ('vonmises', '<f4')]))

Is there a nice way to accomplish this without specifying each individual id?


I do not know why you use this format but here is a hack with pandas:

import pandas as pd

df  = pd.DataFrame(x)
df_ = df.groupby('id')['vonmises'].max().reset_index()

In [213]: df_.merge(df, on=['id','vonmises'])[['id','vonmises','subcase']]

array([[   2.,  400.,   44.],
       [   3.,  400.,   33.],
       [   5.,  500.,   55.]], dtype=float32)