cdleary cdleary - 9 months ago 50
Python Question

How should I log while using multiprocessing in Python?

Right now I have a central module in a framework that spawns multiple processes using the Python 2.6

module. Because it uses
, there is module-level multiprocessing-aware log,
LOG = multiprocessing.get_logger()
. Per the docs, this logger has process-shared locks so that you don't garble things up in
(or whatever filehandle) by having multiple processes writing to it simultaneously.

The issue I have now is that the other modules in the framework are not multiprocessing-aware. The way I see it, I need to make all dependencies on this central module use multiprocessing-aware logging. That's annoying within the framework, let alone for all clients of the framework. Are there alternatives I'm not thinking of?


The only way to deal with this non-intrusively is to:

  1. Spawn each worker process such that its log goes to a different file descriptor (to disk or to pipe.) Ideally, all log entries should be timestamped.
  2. Your controller process can then do one of the following:
    • If using disk files: Coalesce the log files at the end of the run, sorted by timestamp
    • If using pipes (recommended): Coalesce log entries on-the-fly from all pipes, into a central log file. (E.g., Periodically select from the pipes' file descriptors, perform merge-sort on the available log entries, and flush to centralized log. Repeat.)