cooke - 2 years ago 229

Python Question

I have a large pandas DataFrame that has a multi index of

['Date','Int1','Int2'] and single column that is floating point number.

Currently I am applying some normalization by doing:

`data.unstack().apply(some_matrix_math, axis=1).stack()`

def some_matrix_math(matrix):

#do some matrix math to normalize

return matrix

I am applying the normalization across 'Date','Int1' then would like to put the data frame back to having and index of ['Date','Int1','Int2'].

The above code works but is very slow on large data sets. I am wondering if there is a faster way to do the same thing?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

In my function I convert the initial vector to ndarray by doing:

```
def some_matrix_math(matrix):
ndarray = matrix.values
#do some matrix math to normalize
return matrix
```

I then use numpy functions and vectors instead of pandas series and things run on the order of 100x faster.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**