gussilago gussilago - 2 months ago 7
Python Question

Create or modify DataFrame using another DataFrame

I currently have a Pandas DataFrame looking like this:

DATESTAMP price name pct_chg
0 2006-01-02 62.987301 a 0.000000
1 2006-01-03 61.990700 a -0.015822
2 2006-01-04 62.987301 a 0.016077
3 2006-01-05 62.987301 a 0.000000
4 2006-01-06 61.990700 a -0.015822
6 2006-01-04 100.1 b 0.000000
7 2006-01-05 100.5 b -0.015822
8 2006-01-06 100.7 b 0.016077
9 2006-01-07 100.8 b 0.016090


The problem is that different items (specified with unique column
name
) have different time of origination as well as being alive for different amount of time


  • Above item
    a
    starts at
    2016-01-02
    and ends at
    2016-01-06

  • Above item
    b
    starts at
    2006-01-04
    and ends at
    2006-01-07
    .



I would like to summarize the column
pct_chg
in a new DataFrame, having
DATESTAMP
as index and columns as of
name
. I would also like the new DataFrame to have the index in such a manner that it starts with the "oldest" existing date-record (in this case
2006-01-02
) and ends at the "newest" (in this case
2006-01-07
).

The result would look like

a b
2006-01-02 0.000000 NaN
2006-01-03 -0.015822 NaN
2006-01-04 0.016077 0.000000
2006-01-05 0.000000 -0.015822
2006-01-06 -0.015822 0.016077
2006-01-07 NaN 0.016090

Answer

You can use set_index with unstack:

print (df.set_index(['DATESTAMP','name'])['pct_chg'].unstack())
name               a         b
DATESTAMP                     
2006-01-02  0.000000       NaN
2006-01-03 -0.015822       NaN
2006-01-04  0.016077  0.000000
2006-01-05  0.000000 -0.015822
2006-01-06 -0.015822  0.016077
2006-01-07       NaN  0.016090

Another solution with pivot:

print (df.pivot(index='DATESTAMP', columns='name', values='pct_chg'))
name               a         b
DATESTAMP                     
2006-01-02  0.000000       NaN
2006-01-03 -0.015822       NaN
2006-01-04  0.016077  0.000000
2006-01-05  0.000000 -0.015822
2006-01-06 -0.015822  0.016077
2006-01-07       NaN  0.016090