Dervin Thunk Dervin Thunk - 1 month ago 15
Python Question

correct pattern for dask compute minimum?

Is this the correct way to call

compute()
?

def call_minmax_duration(data):
mmin = dd.DataFrame.min(data).compute()
mmax = dd.DataFrame.max(data).compute()
return mmin, mmax

Answer

Two things.

Your data variable should be a dask.dataframe object, such as might be created by dd.from_pandas(...) or dd.read_csv(...)

Second, it's probably better to compute both results at once that way shared intermediates only need to be computed once

Example

import dask.dataframe as dd
df = dd.read_csv('2016-*-*.csv')

dd.compute(df.mycolumn.min(), df.mycolumn.max())
Comments