jean jean - 5 days ago 6
Python Question

Group list elements using pandas in python

I have a python list as follows:

my_list =

[[25, 1, 0.65],
[25, 3, 0.63],
[25, 2, 0.62],
[50, 3, 0.65],
[50, 2, 0.63],
[50, 1, 0.62]]


I want to order them according to this rule:

1 --> [0.65, 0.62] <--25, 50
2 --> [0.62, 0.63] <--25, 50
3 --> [0.63, 0.65] <--25, 50


So the expected result is as follows:

Result = [[0.65, 0.62],[0.62, 0.63],[0.63, 0.65]]

I tried as follows:
import pandas as pd

df = pd.DataFrame(my_list,columns=['a','b','c'])
res = df.groupby(['b', 'c']).get_group('c')
print res

ValueError: must supply a tuple to get_group with multiple grouping keys


How to do it guys?

Answer

Here is a pandas solution, you can sort the list by the first column, groupby the second column and covert the third column to list, if you prefer the result to be a list, use tolist() method afterwards:

df = pd.DataFrame(my_list, columns=list('ABC'))

s = df.sort_values('A').groupby('B').C.apply(list)

#B
#1    [0.65, 0.62]
#2    [0.62, 0.63]
#3    [0.63, 0.65]
#Name: C, dtype: object

The above method obtains a pandas series:


To get a list of lists:

s.tolist():
# [[0.65000000000000002, 0.62], [0.62, 0.63], [0.63, 0.65000000000000002]]

To get a numpy array of lists:

s.values
# array([[0.65000000000000002, 0.62], [0.62, 0.63],
#        [0.63, 0.65000000000000002]], dtype=object)

s.values[0]
# [0.65000000000000002, 0.62]          # here each element in the array is still a list

To get a 2D array or a matrix, you can transform the data frame in a different way, i.e pivot your original data frame to wide format and then convert it to a 2d array:

df.pivot('B', 'A', 'C').as_matrix()
# array([[ 0.65,  0.62],
#        [ 0.62,  0.63],
#        [ 0.63,  0.65]])

Or:

np.array(s.tolist())
# array([[ 0.65,  0.62],
#        [ 0.62,  0.63],
#        [ 0.63,  0.65]])
Comments