RDJ RDJ - 5 months ago 57x
Python Question

Pandas: How to apply a function to different columns

Let's say this is my function:

def function(x):
return x.str.lower()

And this is my DataFrame (df)

0 1.67430 BAR 0.34380 FOO
1 2.16323 FOO -2.04643 BAR
2 0.19911 BAR -0.45805 FOO
3 0.91864 BAR -0.00718 BAR
4 1.33683 FOO 0.53429 FOO
5 0.97684 BAR -0.77363 BAR

I want to apply the function to just columns
. (Applying it to the full DataFrame isn't the answer as that produces NaN values in the numeric columns).

This is my basic idea:
df.apply(function, axis=1)

But I cannot fathom how to select distinct columns to apply the function to. I've tried all manner of indexing by numeric position, name, etc.

I've spent quite a bit of time reading around this. This isn't a direct duplicate of any of these:

How to apply a function to two columns of Pandas dataframe

Pandas: How to use apply function to multiple columns

Pandas: apply different functions to different columns

Python Pandas: Using 'apply' to apply 1 function to multiple columns


Just subselect the columns from the df, by neglecting the axis param we operate column-wise rather than row-wise which will be significantly as you have more rows than columns here:


this will run your func against each column

In [186]:

     B    D
0  bar  foo
1  foo  bar
2  bar  foo
3  bar  bar
4  foo  foo
5  bar  bar

You can also filter the df to just get the string dtype columns:

In [189]:

     B    D
0  bar  foo
1  foo  bar
2  bar  foo
3  bar  bar
4  foo  foo
5  bar  bar


column-wise versus row-wise:

In [194]:    
%timeit df.select_dtypes(include=['object']).apply(function, axis=1)
%timeit df.select_dtypes(include=['object']).apply(function)

100 loops, best of 3: 3.42 ms per loop
100 loops, best of 3: 2.37 ms per loop

However for significantly larger dfs (row-wise) the first method will scale much better