Ratchainant Thammasudjarit Ratchainant Thammasudjarit - 29 days ago 11
Python Question

unique combinations of values in selected columns in pandas data frame and count

I have my data in pandas data frame as follows:

df1 = pd.DataFrame({'A':['yes','yes','yes','yes','no','no','yes','yes','yes','no'],
'B':['yes','no','no','no','yes','yes','no','yes','yes','no']})


So, my data looks like this

----------------------------
index A B
0 yes yes
1 yes no
2 yes no
3 yes no
4 no yes
5 no yes
6 yes no
7 yes yes
8 yes yes
9 no no
-----------------------------


I would like to transform it to another data frame. The expected output can be shown in the following python script:

output = pd.DataFrame({'A':['no','no','yes','yes'],'B':['no','yes','no','yes'],'count':[1,2,4,3]})


So, my expected output looks like this

--------------------------------------------
index A B count
--------------------------------------------
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
--------------------------------------------


Actually, I can achieve to find all combinations and count them by using the following command:
mytable = df1.groupby(['A','B']).size()


However, it turns out that such combinations are in a single column. I would like to separate each value in a combination into different column and also add one more column for the result of counting. Is it possible to do that? May I have your suggestions? Thank you in advance.

Answer

You can groupby on cols 'A' and 'B' and call size and then reset_index and rename the generated column:

In [26]:

df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3
Comments