q0987 q0987 - 26 days ago 8
Python Question

Why does multiprocessing.Manager create an extra process?

#!/usr/bin/env python

import multiprocessing
import sys
import time

def sleeping_worker(state):
s_time = state['sleep_time']
print('sleeping_worker: start to sleep for {0} seconds'.format(s_time))
time.sleep(s_time)


def main():
# the main has been seen
manager = multiprocessing.Manager()
# the main and the worker threadA has been seen

time_to_sleep = 10
state = manager.dict(sleep_time=time_to_sleep)

while True:
state = manager.dict(sleep_time=time_to_sleep)
worker = multiprocessing.Process(target=sleeping_worker, args=(state, ))
worker.start()
# the main, the worker threadA and the worker threadB have been seen

worker.join()
# the main and worker threadA has been seen
print('main: return from sleeping_worker')


if __name__ == "__main__":
main()


"""
xx 22897 25004 0 10:00 pts/0 00:00:00 python ./testThread.py
xx 22898 22897 0 10:00 pts/0 00:00:00 python ./testThread.py
xx 22960 22897 0 10:00 pts/0 00:00:00 python ./testThread.py

the main thread: 22897
the worker threadA: 22898
the worker threadB: 22960
"""


I use
multiprocessing.Process
to create a worker process but I see there are two worker processes created. The worker processA runs at the same time with the main process. The worker processB only runs for 10 seconds, and then it will terminate and start over again.

Based on my observation, the worker processA is created after the call of
manager = multiprocessing.Manager()
, and this is NOT expected at all. The worker processB is created when
worker.start()
is called, and this is expected.

Since the call to
sleeping_worker
function is expensive in real code, I would like to eliminate the worker processA completely. Is that possible? Ideally, I only expect to see two processes (i.e. main and worker processB).

Answer Source

multiprocessing.Manager works by creating a "server" process, which is responsible for housing all your shared data. Then, your main process and worker processes communicate to the managed objects in the server process via proxies. As stated in the docs (emphasis mine):

Managers provide a way to create data which can be shared between different processes, including sharing over a network between processes running on different machines. A manager object controls a server process which manages shared objects. Other processes can access the shared objects by using proxies.

There's no way to use a Manager without spawning the server process; the server process is a core part of the Manager's functionality.