morfara morfara - 5 months ago 12
Python Question

Group by a dataframe in Pandas with common values across columns

I have a dataframe in Pandas for example:

Col1 Col2 Col3
A B 10/10/2016
B A 09/12/2016
A C 08/11/2016


I would like for each unique entry in Col1 and Col2 to get their Col3 values. The problem is that a user can either be in Col1 or Col2, so a groupby Col1 will miss the B - A line.

Any idea on how to achieve it?

Many thanks in advance!

Answer

You could:

df.set_index(['Col3']).stack().reset_index(-1, drop=True)

to get:

Col3
10/10/2016    A
10/10/2016    B
09/12/2016    B
09/12/2016    A
08/11/2016    A
08/11/2016    C

which you could further process, for instance using (assuming the above was assigned to result:

result.groupby(result.values).apply(lambda x: x.index.tolist())

A    [10/10/2016, 09/12/2016, 08/11/2016]
B                [10/10/2016, 09/12/2016]
C                            [08/11/2016]

Alternatively, you could use pd.melt:

pd.melt(df, value_vars=['Col1', 'Col2'], id_vars='Col3')

         Col3 variable value
0  10/10/2016     Col1     A
1  09/12/2016     Col1     B
2  08/11/2016     Col1     A
3  10/10/2016     Col2     B
4  09/12/2016     Col2     A
5  08/11/2016     Col2     C