user1717931 user1717931 - 4 months ago 9
Python Question

retrieve normal data frame after pivoting a data frame

I have a pandas data frame which is a result of pivoting. It has multiple indices. I want to get a "normal" data frame out of this pivoted df...so that I can do some normal operations on the new df.

Here is an example: My pivoted data frame looks like this:

feature_value
feature_type f1 f2 f3 f4 f5
time name
2016-05-10 Clay 0 1 30 4 40
2016-05-10 John 0 4 10 4 66
2016-05-10 Mary 0 1 40 4 46
2016-05-10 Boby 2 0 30 4 59
2016-05-10 Lucy 5 8 20 4 41


The following is what I want as a new df:

time name f1 f2 f3 f4 f5
2016-05-10 Clay 0 1 30 4 40
2016-05-10 John 0 4 10 4 66
2016-05-10 Mary 0 1 40 4 46
2016-05-10 Boby 2 0 30 4 59
2016-05-10 Lucy 5 8 20 4 41


How can I do this?

The pivoted_df.to_dict() looks like this:

{('feature_value', 'f1'): {(Timestamp('2016-05-10'), 'Clay'): 0, (Timestamp('2016-05-10'), 'John'): 0, (Timestamp('2016-05-10'), 'Mary'): 0, (Timestamp('2016-05-10'), 'Boby'): 2, (Timestamp('2016-05-10'), 'Lucy'): 5}, ('feature_value', 'f2'): {(Timestamp('2016-05-10'), 'Clay'): 1, (Timestamp('2016-05-10'), 'John'): 4, (Timestamp('2016-05-10'), 'Mary'): 1, (Timestamp('2016-05-10'), 'Boby'): 0, (Timestamp('2016-05-10'), 'Lucy'): 8}, ('feature_value', 'f3'): {(Timestamp('2016-05-10'), 'Clay'): 30, (Timestamp('2016-05-10'), 'John'): 10, (Timestamp('2016-05-10'), 'Mary'): 40, (Timestamp('2016-05-10'), 'Boby'): 30, (Timestamp('2016-05-10'), 'Lucy'): 20}, ('feature_value', 'f4'): {(Timestamp('2016-05-10'), 'Clay'): 4, (Timestamp('2016-05-10'), 'John'): 4, (Timestamp('2016-05-10'), 'Mary'): 4, (Timestamp('2016-05-10'), 'Boby'): 4, (Timestamp('2016-05-10'), 'Lucy'): 4}, ('feature_value', 'f5'): {(Timestamp('2016-05-10'), 'Clay'): 40, (Timestamp('2016-05-10'), 'John'): 66, (Timestamp('2016-05-10'), 'Mary'): 46, (Timestamp('2016-05-10'), 'Boby'): 59, (Timestamp('2016-05-10'), 'Lucy'): 41}}

Answer

When you call pivot_table, make sure you specify the values parameter:

df.pivot_table(index=['time', 'name'], columns=['feature_type'], 
               values='feature_value')

without the values='feature_value', you will get a MultiIndex column index with (possibly) a single outer level such as 'feature_value'.

df.pivot_table(index=['time', 'name'], ...) would also return a DataFrame with a MultiIndex row index with time and name levels. To make those index levels regular columns, call reset_index():

result = df.pivot_table(index=['time', 'name'], 
                        columns=['feature_type'],
                        values='feature_value').reset_index()

For example with,

import numpy as np
import pandas as pd
np.random.seed(2016)

N = 10
df = pd.DataFrame(
    {'time': np.random.choice(pd.date_range('2016-05-10', '2016-05-12'), size=N),
     'name': np.random.choice(['Clay', 'John', 'Mary', 'Boby', 'Lucy'], size=N),
     'feature_type': np.random.choice(['f{}'.format(i) for i in range(1,6)], size=N),
     'feature_value': np.random.randint(100, size=N)})

orig = df.pivot_table(index=['time', 'name'], columns=['feature_type'])
print(orig)

alt = df.pivot_table(index=['time', 'name'], 
                     columns=['feature_type'],
                     values='feature_value').reset_index()
alt.columns.name = None
print(alt)

orig looks like this:

                feature_value                        
feature_type               f1    f2    f3    f4    f5
time       name                                      
2016-05-10 John           NaN  50.0   NaN   NaN  91.0
           Lucy           NaN   NaN   NaN  28.0   NaN
           Mary           NaN   NaN  19.0   NaN  27.0
2016-05-11 Clay           2.0   NaN   NaN   NaN   NaN
           Lucy          24.0   NaN   NaN   NaN   NaN
2016-05-12 Boby           NaN  16.0   NaN   NaN   NaN
           John           NaN   NaN   NaN   NaN  62.0
           Mary           NaN   NaN   NaN  84.0   NaN

while alt looks like

        time  name    f1    f2    f3    f4    f5
0 2016-05-10  John   NaN  50.0   NaN   NaN  91.0
1 2016-05-10  Lucy   NaN   NaN   NaN  28.0   NaN
2 2016-05-10  Mary   NaN   NaN  19.0   NaN  27.0
3 2016-05-11  Clay   2.0   NaN   NaN   NaN   NaN
4 2016-05-11  Lucy  24.0   NaN   NaN   NaN   NaN
5 2016-05-12  Boby   NaN  16.0   NaN   NaN   NaN
6 2016-05-12  John   NaN   NaN   NaN   NaN  62.0
7 2016-05-12  Mary   NaN   NaN   NaN  84.0   NaN
Comments