bencampbell_14 bencampbell_14 - 1 month ago 5
Python Question

Use apply() with Pandas Series

I have the code below:

import pandas as pd
frame = pd.DataFrame(np.random.randn(4,3), columns=list('bde'),index=['Utah','Ohio','Texas','Oregon'])


b d e
Utah 0.479210 0.161892 -1.315375
Ohio -0.572543 0.080203 -0.446178
Texas 0.052954 0.043417 0.365056
Oregon 1.462631 0.244453 2.207720

f = lambda x: x.max()-x.min()

This results to:

b 2.035174
d 0.201035
e 3.523095
dtype: float64

Im trying to learn how to apply the lambda to the specific column only so I wanted to apply the lambda to the 'd' column only. So this is what I did


It results to an error though:
AttributeError: 'float' object has no attribute 'max'



I try to debug it. It seems that frame['d'] which is of type Series and each of the values in this series is a float and a float doesn't have a min/max attribute.

I thought I'm just missing something simple here, but my limited knowledge of Python and Pandas is giving me a hard time. How do I get to apply the lambda to column 'd' only?


The problem is .apply on a Series works elementwise, in a DataFrame it works by series or by row. If you really want to use .apply this way, you can subset like this:

In [9]: frame.loc[:,['d']]
Utah    2.259488
Ohio    0.458926
Texas  -0.072635
Oregon  0.470217

In [10]: type(frame.loc[:,['d']])
Out[10]: pandas.core.frame.DataFrame

Which returns a DataFrame. So then you can simply do:

In [11]: frame.loc[:,['d']].apply(lambda x: x.max()-x.min())
d    2.332124
dtype: float64

Note, for brevity you can simply use frame[['d']], however, this makes more sense:

In [12]: frame.d.max() - frame.d.min()
Out[12]: 2.3321235565383334

ETA: In fact, even for the whole DataFrame you really don't need apply in this case, and it will certainly be slower than the following:

In [19]: frame.max() - frame.min()
b    3.337040
d    2.332124
e    2.224037
dtype: float64