PaweÅ‚ Rumian - 7 months ago 90

Python Question

I'm trying to do a pivot of a table containing strings as results.

`import pandas as pd`

df1 = pd.DataFrame({'index' : range(8),

'variable1' : ["A","A","B","B","A","B","B","A"],

'variable2' : ["a","b","a","b","a","b","a","b"],

'variable3' : ["x","x","x","y","y","y","x","y"],

'result': ["on","off","off","on","on","off","off","on"]})

df1.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])

But I get:

`DataError: No numeric types to aggregate`

This works as intended when I change result values to numbers:

`df2 = pd.DataFrame({'index' : range(8),`

'variable1' : ["A","A","B","B","A","B","B","A"],

'variable2' : ["a","b","a","b","a","b","a","b"],

'variable3' : ["x","x","x","y","y","y","x","y"],

'result': [1,0,0,1,1,0,0,1]})

df2.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])

And I get what I need:

`variable1 A B`

variable2 a b a b

variable3 x y x y x y

index

0 1 NaN NaN NaN NaN NaN

1 NaN NaN 0 NaN NaN NaN

2 NaN NaN NaN NaN 0 NaN

3 NaN NaN NaN NaN NaN 1

4 NaN 1 NaN NaN NaN NaN

5 NaN NaN NaN NaN NaN 0

6 NaN NaN NaN NaN 0 NaN

7 NaN NaN NaN 1 NaN NaN

I know I can map the strings to numerical values and then reverse the operation, but maybe there is a more elegant solution?

Answer

My original reply was based on Pandas 0.14.1, and since then, many things changed in the pivot_table function (rows --> index, cols --> columns... )

Additionally, it appears that the original lambda trick I posted no longer works on Pandas 0.18. You have to provide a reducing function (even if it is min, max or mean). But even that seemed improper - because we are not reducing the data set, just transforming it.... So I looked harder at unstack...

```
import pandas as pd
df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})
# these are the columns to end up in the multi-index columns.
unstack_cols = ['variable1', 'variable2', 'variable3']
```

First, set an index on the data using the index + the columns you want to stack, then call unstack using the level arg.

```
df1.set_index(['index'] + unstack_cols).unstack(level=unstack_cols)
```

Resulting dataframe is below.