AVD AVD - 2 months ago 14
Python Question

Pandas python: Create function to merge two dataframes based on defined list of columns

In the script that I am writing, I want to frequently repeat the same piece of code, where I create a "numerator" dataframe with one group by, and then a "denominator" dataframe with a different group by. I then merge the two together so that I have the numerator and denominator in one place. I am trying to create a function where all I have to pass to it is the list of fields I want included in the numerator and denominator.

Here is the function:

def calcfractions(self, df, numlist, denomlist):
print("test 1")
numlist.append(denomlist)
selectlist = numlist
selectlist.append("TeamID")
selectlist.append("PlayerID")

print("test 2")
numdf = df[selectlist].groupby(numlist).agg({"PlayerID": "count"})
denomdf = df[selectlist].groupby(denomlist).agg({"PlayerID": "count"})

print("test 3")
mergeddf = pd.merge(numdf, denomdf, on=denomlist)

print("test 4")
return mergeddf


Here is the script I'm trying to use it in:

def team_pr(self, df1):
numlist = ['PlayerLevel']
denomlist = ['TeamName', 'Year']

mergeddf = self.calcfractions(df1, numlist, denomlist)
print(mergeddf.head(2))


However, when I run this, I only get to printing "test 2" in def calcfractions, something fails after that point. I think it might have to do with trying to append denomlist to numlist. Any thoughts?

EDIT: The script doesn't "fail", there is no error. It just ends.

Answer

So, after concocting my own dataframe with bogus values and trying to work through this, I have found that I run into a ValueError: setting an array element with a sequence. This is due to the fact that you are appending a list to a list:

numlist = ['PlayerLevel']
denomList = ['TeamName', 'Year']
numlist.append(denomlist) # as you suspected this is problematic:

print(numlist)
['PlayerLevel', ['TeamName', 'Year']]

Try this instead:

numlist += denomlist

Is this entire provided snippet wrapped up in some try: except: clause somewhere? In any case, if this doesn't solve your problem, please provide us with a small version of your dataframe.

Comments