Hugo - 6 months ago 21

Python Question

I was using a wind speed calculation function from lon and lat components:

`def wind_speed(u, v):`

return np.sqrt(u ** 2 + v ** 2)

and calling it to calculate a new pandas column from two existing ones:

`df['wspeed'] = map(wind_speed, df['lonwind'], df['latwind'])`

Since I changed from Python 2.7 to Python 3.5 the function is not working anymore. Could the change be the cause?

In a single argument (column) function:

`def celsius(T):`

return round(T - 273, 1)

I am now using:

`df['temp'] = df['t2m'].map(celsius)`

And it works fine.

Could you help me?

Answer

I would try to stick to existing numpy/scipy functions as they are extremely fast and optimized (numpy.hypot):

```
df['wspeed'] = np.hypot(df.latwind, df.lonwind)
```

**Timing:** against 300K rows DF:

```
In [47]: df = pd.concat([df] * 10**5, ignore_index=True)
In [48]: df.shape
Out[48]: (300000, 2)
In [49]: %paste
def wind_speed(u, v):
return np.sqrt(u ** 2 + v ** 2)
## -- End pasted text --
In [50]: %timeit list(map(wind_speed, df['lonwind'], df['latwind']))
1 loop, best of 3: 922 ms per loop
In [51]: %timeit np.hypot(df.latwind, df.lonwind)
100 loops, best of 3: 4.08 ms per loop
```

**Conclusion:** vectorized approach was 230 times faster

If you have to write your own one, try to use vectorized math (working with vectors / columns instead of scalars):

```
def wind_speed(u, v):
# using vectorized approach - column's math instead of scalar
return np.sqrt(u * u + v * v)
df['wspeed'] = wind_speed(df['lonwind'] , df['latwind'])
```

demo:

```
In [39]: df['wspeed'] = wind_speed(df['lonwind'] , df['latwind'])
In [40]: df
Out[40]:
latwind lonwind wspeed
0 4 1 4.123106
1 5 2 5.385165
2 6 3 6.708204
```

same vectorized approach with `celsius()`

function:

```
def celsius(T):
# using vectorized function: np.round()
return np.round(T - 273, 1)
```