Hugo Hugo - 5 months ago 19
Python Question

How do I calculate a pandas column with multiple columns as arguments?

I was using a wind speed calculation function from lon and lat components:

def wind_speed(u, v):
return np.sqrt(u ** 2 + v ** 2)


and calling it to calculate a new pandas column from two existing ones:

df['wspeed'] = map(wind_speed, df['lonwind'], df['latwind'])


Since I changed from Python 2.7 to Python 3.5 the function is not working anymore. Could the change be the cause?

In a single argument (column) function:

def celsius(T):
return round(T - 273, 1)


I am now using:

df['temp'] = df['t2m'].map(celsius)


And it works fine.

Could you help me?

Answer

I would try to stick to existing numpy/scipy functions as they are extremely fast and optimized (numpy.hypot):

df['wspeed'] = np.hypot(df.latwind, df.lonwind)

Timing: against 300K rows DF:

In [47]: df = pd.concat([df] * 10**5, ignore_index=True)

In [48]: df.shape
Out[48]: (300000, 2)

In [49]: %paste
def wind_speed(u, v):
    return np.sqrt(u ** 2 + v ** 2)

## -- End pasted text --

In [50]: %timeit list(map(wind_speed, df['lonwind'], df['latwind']))
1 loop, best of 3: 922 ms per loop

In [51]: %timeit np.hypot(df.latwind, df.lonwind)
100 loops, best of 3: 4.08 ms per loop

Conclusion: vectorized approach was 230 times faster

If you have to write your own one, try to use vectorized math (working with vectors / columns instead of scalars):

def wind_speed(u, v):
    # using vectorized approach - column's math instead of scalar 
    return np.sqrt(u * u + v * v)

df['wspeed'] = wind_speed(df['lonwind'] , df['latwind'])

demo:

In [39]: df['wspeed'] = wind_speed(df['lonwind'] , df['latwind'])

In [40]: df
Out[40]:
   latwind  lonwind    wspeed
0        4        1  4.123106
1        5        2  5.385165
2        6        3  6.708204

same vectorized approach with celsius() function:

def celsius(T):
    # using vectorized function: np.round()
    return np.round(T - 273, 1)