Kris Harper - 3 months ago 36

Python Question

I am pretty new to

`pandas`

`numpy`

Right now I am trying to call a function on every row of a

`dataframe`

`numpy`

`apply`

`dataframe`

My guess is that

`numpy`

`pandas`

`apply`

`for row in frame: func(row['a'], row['b'], row['c'])`

`func(col['a'], col['b'], col['c'])`

Here is an example of what I am trying to do.

`import numpy as np`

import pandas as pd

from scipy.stats import beta

count = 100000

# If I start with a given dataframe and use apply, it's very slow

df = pd.DataFrame(np.random.uniform(0, 1, size=(count, 3)), columns=['a', 'b', 'c'])

df.apply(lambda frame: beta.cdf(frame['a'], frame['b'], frame['c']), axis=1)

# However, if I split out each column into a numpy array, this is very fast.

a = df['a'].as_matrix()

b = df['b'].as_matrix()

c = df['c'].as_matrix()

beta.cdf(a, b, c)

# But at this point I've lost the context of the dataframe.

# I would like to keep the results in a new column for further processing

Answer

It's not clear why you're trying to use `apply`

. You can just do `beta.cdf(df.a, df.b, df.c)`

.