i regular - 3 years ago 184

Python Question

Hello I have some data from an accelorometer where I try to smooth the values out. The problem I'm having is that my data frame contains approx. 1,000,000 rows and running the *smoothing* function seen below therefore takes several minutes (I'm running it in jupyter)

`def smoothing(df, alpha, length):`

df['x'][0] = df['x'][0] * alpha

for i in range(1,length):

df['x'][i] = df['x'][i-1]+alpha*(df['x'][i] - df['x'][i-1])

return df

My question is therefore if there is anyway to enhance or speed up this computation by using vectorization, pandas.apply or similar. Please note that I've tried using these approaches myself but without any luck as I fail to produce the correct result. The part I'm struggling with is getting the result of the previous rows and im unsure how to e.g. use .shift() to get the same functionality as in the

Here is some sample data:

`x_list = [21,42,49,8,0,-57,-137, -135,-177, -181]`

data = pd.DataFrame(x_list, columns=['x'])

smoothing(data, 0.02, len(x_list))

Expected result:

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

You can use apply with help of a global variable to store the calcuated value to get your desired output i.e

```
store = 0
def m(x):
global store
if x == data['x'][0]:
store = 0.2*x
return store
else :
store = (store+alpha*(x - store))
return store
data['x'].apply(m)
```

Output:

0 4.200000 1 11.760000 2 19.208000 3 16.966400 4 13.573120 5 -0.541504 6 -27.833203 7 -49.266563 8 -74.813250 9 -96.050600 Name: x, dtype: float64

%%timeit data['x'].apply(m) 1000 loops, best of 3: 478 µs per loop n = pd.concat([data['x']]*10000).reset_index(drop=True) # in function condtion shld be n[0] instead of data['x'][0] n.apply(m) 1 loop, best of 3: 2.18 s per loop

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**