Kyle Kyle - 2 months ago 17
Python Question

How best to use pandas.DataFrame.pivot?

I am trying to translate a dataframe from rows of key, value to a table with keys as the columns and values as cells. For example:

Input dataframe with key, value:

>>>df = pd.DataFrame([['TIME', 'VAL1', 'VAL2', 'VAL3',
'TIME', 'VAL1', 'VAL2', 'VAL3'],
["00:00:01",1,2,3,"00:00:02", 1,2,3]]).T

0 1
0 TIME 00:00:01
1 VAL1 1
2 VAL2 2
3 VAL3 3
4 TIME 00:00:02
5 VAL1 1
6 VAL2 2
7 VAL3 3


I want it to look like:

TIME VAL1 VAL2 VAL3
00:00:01 1 2 3
00:00:02 1 2 3


I can almost get what I want with pivot:

>>>df.pivot(columns=0, values=1)
TIME VAL1 VAL2 VAL3
0 00:00:01 None None None
1 None 1 None None
2 None None 2 None
3 None None None 3
4 00:00:02 None None None
5 None 1 None None
6 None None 2 None
7 None None None 3


And I can merge the rows to get what I want:

>>> df.pivot(columns=0, values=1).ffill().drop_duplicates(subset='TIME',
keep='last').set_index('TIME')
TIME VAL1 VAL2 VAL3
00:00:01 1 2 3
00:00:02 1 2 3


But this seems like a rather awkward way to do it that would waste a lot of memory for a large data set. Is there a simpler method?

I tired looking at
pd.DataFrame.from_items()
and
pd.DataFrame.from_records()
but was not having success.

Answer

You need an "ID" variable that indicates which rows go together. In your desired output, you are implicitly assuming that every block of 4 rows should become a single row, but pandas won't assume that, because in general pivoting should be able to group together nonconsecutive rows. Each set of rows that you want to become a single row in the new DataFrame must have some shared value.

If your data really is just chunks of four rows, you can create the ID variable like this:

df['ID'] = np.arange(len(df))//4

You can see that the ID variable now marks which rows should be grouped:

>>> df
      0         1  ID
0  TIME  00:00:01   0
1  VAL1         1   0
2  VAL2         2   0
3  VAL3         3   0
4  TIME  00:00:02   1
5  VAL1         1   1
6  VAL2         2   1
7  VAL3         3   1

Then use this new column as the "index" of the pivot.

>>> df.pivot(index="ID", columns=0, values=1)
0       TIME VAL1 VAL2 VAL3
ID                         
0   00:00:01    1    2    3
1   00:00:02    1    2    3