Charles Morris Charles Morris - 16 days ago 5
Python Question

Why do I need lambda to apply functions to a Pandas Dataframe?

I have a Pandas data frame and am attempting to pass a function over the entries in one column using the apply() function.

My function is of the form:

def foo(Y):
#accepts a pandas data frame
#carries out some search on the text in each row of the dataframe
#groups successful searches
#return a new column as a pandas series


My dataframe is of the form:

Info WN RN
0 XX YY ZZ
1 AA BB CC
2 JJ KK LL


I attempt to execute:

df['SR'] = (df['Info'].apply(foo(x)))


My error is as follows:

File "<ipython-input-11-ae54015436d8>", line 1, in <module>
df['SR'] = (df['Info'].apply(foo(x))
NameError: name 'x' is not defined


But if I use:

df['SR'] = (df['Info'].apply(lambda x:foo(x)))


It works fine.

I understand how Lambda works (at least I thought I did). I don't understand why I need it.

Why do I need lambda to successfully pass the function over the data frame? Shouldn't the apply() function do that by definition?

Or is it that I am effectively doing it the other way around i.e. passing my data frame into the function, and returning some output, rather than iteratively applying the function to the data frame (if that makes sense)?

Can anyone offer any insight?

My sincere thanks!

Answer

The lambda is unnecessary, you can just do

df['SR'] = df['Info'].apply(foo)

here it will still work