duckertito duckertito - 19 days ago 6
Python Question

How to merge rows and convert them into columns

I have a dataframe as follows:

ID START END SEQ
1 11 12 1
1 14 15 3
1 13 14 2
2 10 14 1
3 11 15 1
3 16 17 2


I need to transform it into this DataFrame:

ID START_1 END_1 SEQ_1 START_2 END_2 SEQ_2 START_3 END_3 SEQ_3
1 11 12 1 13 14 2 14 15 3
2 10 14 1 NA NA NA NA NA NA
3 11 15 1 16 17 2 NA NA NA


The problem is that the number of rows with the same ID is unknown apriori, which means that the maximum number of columns
START_X
,
END_X
,
SEQ_X
should not be defined manually.
Is there any automated way to do this transformation, taking into account that columns should be ordered by
SEQ
?
Should I use
group_by
or which should be the approach?

Answer

You can use groupby with unstack, then sort_index and last remove MultiIndex from columns by list comprehension:

df['SEQ1'] = df.SEQ
df = df.groupby(['ID','SEQ1']).mean().unstack()
df = df.sort_index(axis=1, level=1)
df.columns = ['_'.join((col[0], str(col[1]))) for col in df.columns]
print (df)
    START_1  END_1  SEQ_1  START_2  END_2  SEQ_2  START_3  END_3  SEQ_3
ID                                                                     
1      11.0   12.0    1.0     13.0   14.0    2.0     14.0   15.0    3.0
2      10.0   14.0    1.0      NaN    NaN    NaN      NaN    NaN    NaN
3      11.0   15.0    1.0     16.0   17.0    2.0      NaN    NaN    NaN

Another solution with pivot_table, aggfunc='mean' is by default:

df['SEQ1'] = df.SEQ
df = df.pivot_table(index= ['ID','SEQ1']).unstack()
df = df.sort_index(axis=1, level=1)
df.columns = ['_'.join((col[0], str(col[1]))) for col in df.columns]
print (df)
    END_1  SEQ_1  START_1  END_2  SEQ_2  START_2  END_3  SEQ_3  START_3
ID                                                                     
1    12.0    1.0     11.0   14.0    2.0     13.0   15.0    3.0     14.0
2    14.0    1.0     10.0    NaN    NaN      NaN    NaN    NaN      NaN
3    15.0    1.0     11.0   17.0    2.0     16.0    NaN    NaN      NaN