Kris Harper Kris Harper - 5 months ago 53
Python Question

Can I apply a vectorized function to a pandas dataframe?

I am pretty new to

, and I'm trying to figure out the best way to do some things.

Right now I am trying to call a function on every row of a
. If I pass in three
arrays to this function, it's very fast, but using
on the
is very slow.

My guess is that
is using vectorized functions in the first case, and not in the second. Is there a way to get
to use that optimization? Basically, in pseudocode I think
is doing something like
for row in frame: func(row['a'], row['b'], row['c'])
but I want it to do
func(col['a'], col['b'], col['c'])

Here is an example of what I am trying to do.

import numpy as np
import pandas as pd
from scipy.stats import beta

count = 100000

# If I start with a given dataframe and use apply, it's very slow

df = pd.DataFrame(np.random.uniform(0, 1, size=(count, 3)), columns=['a', 'b', 'c'])
df.apply(lambda frame: beta.cdf(frame['a'], frame['b'], frame['c']), axis=1)

# However, if I split out each column into a numpy array, this is very fast.

a = df['a'].as_matrix()
b = df['b'].as_matrix()
c = df['c'].as_matrix()

beta.cdf(a, b, c)

# But at this point I've lost the context of the dataframe.
# I would like to keep the results in a new column for further processing


It's not clear why you're trying to use apply. You can just do beta.cdf(df.a, df.b, df.c).