Riley Hun Riley Hun - 1 year ago 135
Python Question

Python Pandas: Is There a Faster Way to Split and Recombine a DataFrame based on criteria?

I want to group this DataFrame based on a particular column "ContactID", but if the group's column "PaymentType" doesn't include a particular value, then I want to remove the entire group from the DataFrame.

I have something like this:

UniqueID = data.drop_duplicates('ContactID')['ContactID'].tolist()
OnlyRefinance=[]
for i in UniqueID:
splits = data[data['ContactID']==i].reset_index(drop=True)
if any(splits['PaymentType']==160):
OnlyRefinance.append(splits)
OnlyRefinance = pd.concat(OnlyRefinance)


This works but it's VERY slow and I was wondering if there was a faster way to accomplish this.

Answer Source

Another option you can use groupby.filter:

data.groupby("ContactID").filter(lambda g: (g.PaymentType == 160).any())

This will only keep groups whose PaymentType contains 160.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download