cooke cooke - 2 years ago 229
Python Question

Optimize pandas unstack().apply().stack()

I have a large pandas DataFrame that has a multi index of
['Date','Int1','Int2'] and single column that is floating point number.
Currently I am applying some normalization by doing:

data.unstack().apply(some_matrix_math, axis=1).stack()

def some_matrix_math(matrix):
#do some matrix math to normalize
return matrix

I am applying the normalization across 'Date','Int1' then would like to put the data frame back to having and index of ['Date','Int1','Int2'].

The above code works but is very slow on large data sets. I am wondering if there is a faster way to do the same thing?

Answer Source

In my function I convert the initial vector to ndarray by doing:

def some_matrix_math(matrix):
 ndarray = matrix.values
 #do some matrix math to normalize
 return matrix

I then use numpy functions and vectors instead of pandas series and things run on the order of 100x faster.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download