user113531 user113531 - 6 months ago 14
Python Question

Creating 2d Matrix from dataframe entries

I have a 2d pandas dataframe with several columns. I want to create a 2d matrix with the count of the number of items in each group for the two columns

C
and
Hr
.

After applying the function
df.groupby(['C', 'Hr']).count()


I obtained the following dataframe:

C Hr A B C D
US 22 2 2 2 2
1 3 3 3 3
JP 14 2 2 2 2
19 4 4 4 4
20 2 2 2 2


How am I able to obtain a 2d matrix/dataframe with C and Hr on each of the axis instead? There is no need to retain the any other columns names, just the count which is the same for all columns.

Example output:

Hr US JP ...
1 2 3 ...
2 3 1 ...
3 3 4 ...


Or with the axis flipped works as well

Answer

You can use pivot_table with aggregating len and if you need fill NaN to 0 use parameter fill_value=0:

print df.pivot_table(index='Hr', columns='C', values='A',  aggfunc=len, fill_value=0)

Sample:

import pandas as pd

df = pd.DataFrame({'Hr': [1] * 10 + [2] * 6 + [3] * 4,
                   'A': np.arange(20),
                   'C': ['US'] * 10 + ['JP'] * 10 })

print df
     A   C  Hr
0    0  US   1
1    1  US   1
2    2  US   1
3    3  US   1
4    4  US   1
5    5  US   1
6    6  US   1
7    7  US   1
8    8  US   1
9    9  US   1
10  10  JP   2
11  11  JP   2
12  12  JP   2
13  13  JP   2
14  14  JP   2
15  15  JP   2
16  16  JP   3
17  17  JP   3
18  18  JP   3
19  19  JP   3

print df.pivot_table(index='Hr', columns='C', values='A',  aggfunc=len, fill_value=0)
C   JP  US
Hr        
1    0  10
2    6   0
3    4   0

If you need reset_index and remove columns names, use rename_axis (new in pandas 0.18.0):

print df.pivot_table(index='Hr', columns='C', values='A',  aggfunc=len, fill_value=0)
        .reset_index()
        .rename_axis(None, axis=1)

   Hr  JP  US
0   1   0  10
1   2   6   0
2   3   4   0
Comments