cdleary cdleary - 3 months ago 17
Python Question

How should I log while using multiprocessing in Python?

Right now I have a central module in a framework that spawns multiple processes using the Python 2.6

module. Because it uses
multiprocessing
, there is module-level multiprocessing-aware log,
LOG = multiprocessing.get_logger()
. Per the docs, this logger has process-shared locks so that you don't garble things up in
sys.stderr
(or whatever filehandle) by having multiple processes writing to it simultaneously.

The issue I have now is that the other modules in the framework are not multiprocessing-aware. The way I see it, I need to make all dependencies on this central module use multiprocessing-aware logging. That's annoying within the framework, let alone for all clients of the framework. Are there alternatives I'm not thinking of?

Answer

The only way to deal with this non-intrusively is to:

  1. Spawn each worker process such that its log goes to a different file descriptor (to disk or to pipe.) Ideally, all log entries should be timestamped.
  2. Your controller process can then do one of the following:
    • If using disk files: Coalesce the log files at the end of the run, sorted by timestamp
    • If using pipes (recommended): Coalesce log entries on-the-fly from all pipes, into a central log file. (E.g., Periodically select from the pipes' file descriptors, perform merge-sort on the available log entries, and flush to centralized log. Repeat.)