Cryo Cryo - 3 months ago 27
Python Question

Python multiprocessing pool.map raises IndexError

I've developed a utility using python/cython that sorts CSV files and generates stats for a client, but invoking pool.map seems to raise an exception before my mapped function has a chance to execute. Sorting a small number of files seems to function as expected, but as the number of files grows to say 10, I get the below IndexError after calling pool.map. Does anyone happen to recognize the below error? Any help is greatly appreciated.

While the code is under NDA, the use-case is fairly simple:

Code Sample:

def sort_files(csv_files):
pool_size = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=pool_size)
sorted_dicts = pool.map(sort_file, csv_files, 1)
return sorted_dicts

def sort_file(csv_file):
print 'sorting %s...' % csv_file
# sort code


Output:

File "generic.pyx", line 17, in generic.sort_files (/users/cyounker/.pyxbld/temp.linux-x86_64-2.7/pyrex/generic.c:1723)
sorted_dicts = pool.map(sort_file, csv_files, 1)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 227, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
IndexError: list index out of range

Answer

The IndexError is an error you get somewhere in sort_file(), i.e. in a subprocess. It is re-raised by the parent process. Apparently multiprocessing doesn't make any attempt to inform us about where the error really comes from (e.g. on which lines it occurred) or even just what argument to sort_file() caused it. I hate multiprocessing even more :-(

Comments