Stefano Potter Stefano Potter - 3 months ago 6
Python Question

Combining columns based on group

I have a dataframe like this:

POLY_KEY_I SP1 SP2
0 FS01080100SM001 POAPRA TOXRYD
1 FS01080100SM001 NaN NaN
2 FS01080100SM001 OXRYD SYMOCC
3 FS01080100SM001 EUPESU POAPRA
4 FS01080100SM001 BOUGRA KOEPYR
5 FS01080100SM002 POAPRA EUPESU
6 FS01080100SM002 POAPRA NaN
7 FS01080100SM002 POAPRA KOEPYR


And I want to groupby
POLY_KEY_I
and then combine
SP1
and
SP2
based on this.

My desired output would be something like:

POLY_KEY_I SP
0 FS01080100SM001 POAPRA
1 FS01080100SM001 TOXRYD
2 FS01080100SM001 NaN
3 FS01080100SM001 NaN
4 FS01080100SM001 OXRYD
5 FS01080100SM001 SYMOCC
6 FS01080100SM001 EUPESU
7 FS01080100SM001 POAPRA
8 FS01080100SM001 BOUGRA
9 FS01080100SM001 KOEPYR
10 FS01080100SM002 POAPRA
11 FS01080100SM002 EUPESU
12 FS01080100SM002 POAPRA
13 FS01080100SM002 NaN
14 FS01080100SM002 POAPRA
15 FS01080100SM002 KOEPYR

Answer

You can use melt to reshape from wide to long, like this:

In [10]: pd.melt(df, id_vars='POLY_KEY_I', value_name='SP')
Out[10]: 
         POLY_KEY_I variable      SP
0   FS01080100SM001      SP1  POAPRA
1   FS01080100SM001      SP1     NaN
2   FS01080100SM001      SP1   OXRYD
3   FS01080100SM001      SP1  EUPESU
4   FS01080100SM001      SP1  BOUGRA
5   FS01080100SM002      SP1  POAPRA
6   FS01080100SM002      SP1  POAPRA
7   FS01080100SM002      SP1  POAPRA
8   FS01080100SM001      SP2  TOXRYD
9   FS01080100SM001      SP2     NaN
10  FS01080100SM001      SP2  SYMOCC
11  FS01080100SM001      SP2  POAPRA
12  FS01080100SM001      SP2  KOEPYR
13  FS01080100SM002      SP2  EUPESU
14  FS01080100SM002      SP2     NaN
15  FS01080100SM002      SP2  KOEPYR