Antonio López Ruiz Antonio López Ruiz - 1 year ago 90
Python Question

Pivot with Pandas Python to get booleans

I have the following csv:

From which I created a dataframe with the following code:

import pandas as pd
df = pd.read_csv("C:/Users/Antonio/Desktop/nearBPO/facturas.csv", encoding = "ISO-8859-1")
df_du = df.iloc[:,[0,5]]
dfv = df_du.groupby('UUID')['Desc'].apply(list)
df2 = dfv.reset_index()

*Note: I'm taking the csv locally.

Which after the code looks like this:

0 0019A60D-78F8-E341-8D3E-9786201FE017 [TRANSPORTACION DE PASAJEROS]
1 003B8B8F-7017-E441-8C84-8C0EA577E29D [SERVICIO POR HORA]
2 00536BC1-1B10-4146-A59B-36613090EF10 [CONSUMO Y RENTA DE SALA DE JUNTAS]
4 006C5F2E-CAE0-4498-9288-0241C1949D8A [C Meg XT Clas CH, Com Whop Q CH, C Meg XT Cla...
5 0075D1FC-996D-4784-9755-2F4598D16163 [Consumo]

I would like to make a dataframe which had each element of the 'Desc' column as a column and each UUID as a row where i would have a 1 (or True) if the UUID had the corresponding 'Desc' in it.

Example of what I want:

UUID Transportacion de pasajeros Servicio por hora
0019A60D-78F8-E341-8D3E-9786201FE017 1 0
003B8B8F-7017-E441-8C84-8C0EA577E29D 0 1

What I was trying was to make was a matrix of 0 with an if to make the 1's. Afterwards I would merge it and pivot it. However, since the some 'Desc' are the same, I didn't know how big I should do it. And it seems to come along with many other flaws in the merge part.

Answer Source

You can use

pd.concat([df2['UUID'], df2['Desc'].str.join('___').str.get_dummies('___')], axis=1)

It returns something like this:

                                   UUID  SERVICIO POR HORA  \
0  0019A60D-78F8-E341-8D3E-9786201FE017                  0   
1  003B8B8F-7017-E441-8C84-8C0EA577E29D                  1   

0                            1  
1                            0  
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download