i regular i regular - 3 years ago 184
Python Question

How to enhance the speed of operation on df row that uses the result of previous rows

Hello I have some data from an accelorometer where I try to smooth the values out. The problem I'm having is that my data frame contains approx. 1,000,000 rows and running the smoothing function seen below therefore takes several minutes (I'm running it in jupyter)

def smoothing(df, alpha, length):
df['x'][0] = df['x'][0] * alpha

for i in range(1,length):
df['x'][i] = df['x'][i-1]+alpha*(df['x'][i] - df['x'][i-1])

return df

My question is therefore if there is anyway to enhance or speed up this computation by using vectorization, pandas.apply or similar. Please note that I've tried using these approaches myself but without any luck as I fail to produce the correct result. The part I'm struggling with is getting the result of the previous rows and im unsure how to e.g. use .shift() to get the same functionality as in the smoothing function

Here is some sample data:

x_list = [21,42,49,8,0,-57,-137, -135,-177, -181]
data = pd.DataFrame(x_list, columns=['x'])
smoothing(data, 0.02, len(x_list))

Expected result:

enter image description here

Answer Source

You can use apply with help of a global variable to store the calcuated value to get your desired output i.e

store = 0
def m(x):
    global store 
    if x == data['x'][0]:
        store = 0.2*x
        return store
    else :     
        store = (store+alpha*(x - store))
        return store    



0     4.200000
1    11.760000
2    19.208000
3    16.966400
4    13.573120
5    -0.541504
6   -27.833203
7   -49.266563
8   -74.813250
9   -96.050600
Name: x, dtype: float64
1000 loops, best of 3: 478 ┬Ás per loop

n = pd.concat([data['x']]*10000).reset_index(drop=True) # in function condtion shld be n[0] instead of data['x'][0]
1 loop, best of 3: 2.18 s per loop
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download