I am using
dmaster = dd.from_pandas(master, npartitions=4)
dmaster = dmaster.assign(my_value=dmaster.original.apply(lambda x: helper(x, slave), name='my_value'))
20 Python processes
Of course, I understand the specifics depends on what exactly I am doing, but maybe the patterns above can already tell that something is horribly wrong?
This is pretty spot on. Identifying performance issues is tricky, especially when parallel computing comes into play. Here are some things that come to mind.
helpercould be doing something oddly
Generally a good way to pin down these problems is to create a minimal, complete, verifiable example to share that others can reproduce and play with easily. Often in when creating such an example you find the solution to your problem anyway. But if this doesn't happen at least you can then pass the buck on to the library maintainer. Until such an example is created most lhttps://pypi.python.org/pypi/dask.mesos/0.2.1ibrary maintainers don't bother to spend their time, there is almost always too many details specific to the problem at hand to warrant free service.