Dervin Thunk Dervin Thunk - 1 year ago 86
Python Question

correct pattern for dask compute minimum?

Is this the correct way to call


def call_minmax_duration(data):
mmin = dd.DataFrame.min(data).compute()
mmax = dd.DataFrame.max(data).compute()
return mmin, mmax

Answer Source

Two things.

Your data variable should be a dask.dataframe object, such as might be created by dd.from_pandas(...) or dd.read_csv(...)

Second, it's probably better to compute both results at once that way shared intermediates only need to be computed once


import dask.dataframe as dd
df = dd.read_csv('2016-*-*.csv')

dd.compute(df.mycolumn.min(), df.mycolumn.max())
