Pythus Pythus - 1 year ago 97
Python Question

Apply Numpy function over entire Dataframe

I am applying this function over a dataframe

df1
such as the following:

AA AB AC AD
2005-01-02 23:55:00 "EQUITY" "EQUITY" "EQUITY" "EQUITY"
2005-01-03 00:00:00 32.32 19.5299 32.32 31.0455
2005-01-04 00:00:00 31.9075 19.4487 31.9075 30.3755
2005-01-05 00:00:00 31.6151 19.5799 31.6151 29.971
2005-01-06 00:00:00 31.1426 19.7174 31.1426 29.9647

def func(x):
for index, price in x.iteritems():
x[index] = price / np.sum(x,axis=1)
return x[index]

df3=func(df1.ix[1:])


However, I only get a single column returned as opposed to 3

2005-01-03 0.955843
2005-01-04 0.955233
2005-01-05 0.955098
2005-01-06 0.955773
2005-01-07 0.955877
2005-01-10 0.95606
2005-01-11 0.95578
2005-01-12 0.955621


I am guessing I am missing something in the formula to make it apply to the entire dataframe. Also how could I return the first index that has strings in its row?

Answer Source

You need to do it the following way :

def func(row):
    return row/np.sum(row)
df2 = pd.concat([df[:1], df[1:].apply(func, axis=1)], axis=0)

It has 2 steps :

  1. df[:1] extracts the first row, which contains strings, while df[1:] represents the rest of the DataFrame. You concatenate them later on, which answers the second part of your question.
  2. For operating over rows you should use apply() method.
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download