piRSquared piRSquared - 1 month ago 9
Python Question

Why does pandas apply calculate twice

I'm using the apply method on a panda's DataFrame object. When my DataFrame has a single column, it appears that the applied function is being called twice. The questions are why? And, can I stop that behavior?

Code:

import pandas as pd

def mul2(x):
print 'hello'
return 2*x

df = pd.DataFrame({'a': [1,2,0.67,1.34]})

print df.apply(mul2)


Output:

hello
hello

0 2.00
1 4.00
2 1.34
3 2.68


I'm printing 'hello' from within the function being applied. I know it's being applied twice because 'hello' printed twice. What's more is that if I had two columns, 'hello' prints 3 times. Even more still is when I call applied to just the column 'hello' prints 4 times.

Code:

print df.a.apply(mul2)


Output:

hello
hello
hello
hello
0 2.00
1 4.00
2 1.34
3 2.68
Name: a, dtype: float64

Answer

Probably related to this issue. With groupby, the applied function is called one extra time to see if certain optimizations can be done. I'd guess something similar is going on here. It doesn't look like there's any way around it at the moment (although I could be wrong about the source of the behavior you're seeing). Is there a reason you need it to not do that extra call.

Also, calling it four times when you apply on the column is normal. When you get one columnm you get a Series, not a DataFrame. apply on a Series applies the function to each element. Since your column has four elements in it, the function is called four times.