Razieh Razieh - 21 days ago 6
Python Question

multiprocessing timing inconsistency

I have about 100 processes. Each process contains 10 inputs(logic expressions) and the task of each process is to find the fastest heuristic algorithm for solving each of the logic inputs(I have about 5 heuristic algorithms).

When I run each process separately the results are different from when I run all of the processes in parallel (using python p1.py & python p2.py &….. ). For example, when run the processes separately the input 1 (in p1) finds the first heuristic algorithms as the fastest method but when in parallel the same input finds the 5th heuristic algorithms faster!

Could the reason be that the CPU will switch between the parallel processes and messes up with the timing so it could not give the right time each heuristic algorithm spends to solve the input?

What is the solution? Can decreasing the number of processes to half reduce the false result? (I run my program on a server)

Answer

The operating system has to schedule all your processes on a much smaller amount of CPUs. In order to do so, it runs one process on each CPU for a small amount of time. After that, the operating system schedules the processes out to let the other processes run in order to give process their fair share of running time. Thus each process has to wait for a running slot on a CPU. Those waiting times depend on the amount of other processes waiting to run and almost unpredictable.

If you use clock time for your measurements, the waiting times will pollute your measurements. For a more precise measurement, you could ask the operating system how much CPU time the process used. The function time.process_time() does that.

Switching between processes costs time. Multiple processes accessing the same resources (file, hard disk, CPU caches, memory, ...) costs time. For CPU bound processes, having orders of magnitude more running processes than CPUs will slow down the execution. You'll get better results by starting slightly less processes than the amount of CPUs. The spare CPUs remain available for work needed by the operating system or some other unrelated programs.