user2725109 user2725109 - 1 month ago 14
Python Question

Bug in pandas.DataFrame.merge?

The following:

q = pd.DataFrame([[1,2],[3,4]])
r = pd.DataFrame([[1,2],[5,6]], columns=['a','b'])
pd.merge(q, r, left_on=q.columns, right_on=r.columns, how='left')


raises an error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


The following doesn't:

q = pd.DataFrame([[1,2],[3,4]])
r = pd.DataFrame([[1,2],[5,6]], columns=['a','b'])
pd.merge(q, r, left_on=q.columns.tolist(), right_on=r.columns.tolist(), how='left')


Is this a bug?

Answer

It depends on what is considered array-like in Pandas. It might also be a bug in documentation.

Pandas checks the type of left_on and right_on parameters (see _maybe_make_list function in pandas source), and since they are both not tuple/lists (namely, q.columns is RangeIndex and r.columns is Index), it basically does:

[q.columns] == [r.columns]

instead of comparing them directly, so that outputs the error.

Documentation says left_on: label or list, or array-like. I couldn't find a definition of array-like in Pandas, but in this case it seems to be limited to tuple or list.

Comments