Chris Curvey Chris Curvey - 6 months ago 385
Python Question

how to use initializer to set up my multiprocess pool?

I'm trying to use the multiprocess Pool object. I'd like each process to open a database connection when it starts, then use that connection to process the data that is passed in. (Rather than opening and closing the connection for each bit of data.) This seems like what the initializer is for, but I can't wrap my head around how the worker and the initializer communicate. So I have something like this:

def get_cursor():
return psycopg2.connect(...).cursor()

def process_data(data):
# here I'd like to have the cursor so that I can do things with the data

if __name__ == "__main__":
pool = Pool(initializer=get_cursor, initargs=())
pool.map(process_data, get_some_data_iterator())


how do I (or do I) get the cursor back from get_cursor() into the process_data()?

Answer

Quoting docs:

If initializer is not None then each worker process will call initializer(*initargs) when it starts.

Therefore it seems to me that this won't give you the possibility of retrieving anything from the initializer. Why don't you transform your code into:

def get_cursor():
  return psycopg2.connect(...).cursor()

def process_data(connection,data):
   # here I'd like to have the cursor so that I can do things with the data

if __name__ == "__main__":
  pool = Pool()
  cursor = get_cursor()
  pool.map(process_data, [(cursor,data) for data in get_some_data_iterator()])