fast tooth fast tooth - 3 months ago 30
Python Question

pandas groupby and join lists

I have a dataframe df, with two columns, I want to groupby one column and join the lists belongs to same group, example:

column_a, column_b
1, [1,2,3]
1, [2,5]
2, [5,6]


after the process:

column_a, column_b
1, [1,2,3,2,5]
2, [5,6]


I want to keep all the duplicates. I have the following questions:


  • The dtypes of the dataframe are object(s). convert_objects() doesn't convert column_b to list automatically. How can I do this?

  • what does the function in df.groupby(...).apply(lambda x: ...) apply to ? what is the form of x ? list?

  • the solution to my main problem?



Thanks in advance.

Answer

object dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objects tries to convert a column to one of those dtypes.

You want

In [63]: df
Out[63]: 
   a          b    c
0  1  [1, 2, 3]  foo
1  1     [2, 5]  bar
2  2     [5, 6]  baz


In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]: 
         c                b
a                          
1  foo bar  [1, 2, 3, 2, 5]
2      baz           [5, 6]

This groups the data frame by the values in column a. Read more about [groupby].(http://pandas.pydata.org/pandas-docs/stable/groupby.html).

This is doing a regular list sum (concatenation) just like [1, 2, 3] + [2, 5]