dleal dleal - 2 months ago 31
Python Question

reshape a pandas DataFrame by transposing two columns and repeating another

The title might be a bit confusing, this is what I want to do:

I would like to convert this dataframe

pd.DataFrame({'name':['A','B','C'],'date1':[1999,2000,2001],'date2':[2011,2012,2013]})

date1 date2 name
0 1999 2011 A
1 2000 2012 B
2 2001 2013 C


Into the following:

dates name
0 1999 A
1 2011 A
2 2000 B
3 2012 B
4 2001 C
5 2013 C


I've been trying to do pivot tables and transposing, but with no luck.

Answer Source

You can use melt, remove column by drop and last sort_values:

print (pd.melt(df, id_vars='name', value_name='dates')
         .drop('variable', axis=1)
         .sort_values('name')[['dates','name']])
   dates name
0   1999    A
3   2011    A
1   2000    B
4   2012    B
2   2001    C
5   2013    C

Another solution with unstack and sort_index:

print (df.set_index('name')
         .unstack()
         .reset_index(drop=True, level=0)
         .sort_index()
         .reset_index(name='dates')[['dates','name']])

   dates name
0   1999    A
1   2011    A
2   2000    B
3   2012    B
4   2001    C
5   2013    C

Solution with lreshape and sort_values:

print (pd.lreshape(df, {'dates':['date1', 'date2']}).sort_values('name')[['dates','name']])
   dates name
0   1999    A
3   2011    A
1   2000    B
4   2012    B
2   2001    C
5   2013    C

Numpy solution with numpy.repeat and flattening by numpy.ravel:

df2 = pd.DataFrame({
        "name": np.repeat(df.name, 2),
        "dates": df[['date1','date2']].values.ravel()})
print (df2)
   dates name
0   1999    A
0   2011    A
1   2000    B
1   2012    B
2   2001    C
2   2013    C

EDIT:

lreshape is now undocumented, but is possible in future will by removed (with pd.wide_to_long too).

Possible solution is merging all 3 functions to one - maybe melt, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.