Mahin Ra Mahin Ra - 1 year ago 140
Python Question

MultiProcessing Lock() doesn't work

I have a program that reads some input text files and write all of them into a list called

which is a shared memory between two processes used in my program (I used two processes so my program runs faster!) I also have a shared memory variable called
which stores the names of the already read input files by any of the processes so the current process does not read them again!

  1. To make sure that two processes do not check the "processedFiles" at the same time (For example in the beginning, it is possible that at the same time they both may come to the conclusion that "processedFiles" is empty so they read the same file), therefore, I added a
    around the checking part for
    so one process should complete and release the locked part before the other process checking in to that part!

    My problem is that the
    function seems not to work and when I print the current
    inside the lock part, it shows that both processes are inside the lock part. I cannot figure out what is wrong with my code? (See the output below.)

  2. Since my main program is not only about reading the input text files and printing them in the list, but it has to do a very complicate operation on the input files before printing them into a list, should I use
    instead of
    and why?


import glob
from multiprocessing import Process, Manager
from threading import *
import timeit
import os


def print_content(ProcessName,processedFiles,ListOutput,lock):

for file in glob.glob("*.txt"):


print "\n Current Process:",ProcessName

if file not in processedFiles:
print "\n", file, " not in ", processedFiles," for ",ProcessName
newfile=1#it is a new file

#if it is a new file
if newfile==1:
f = open(file,"r")
lines = f.readlines()

# Create two processes as follows
manager = Manager()

processedFiles = manager.list()
ListOutput = manager.list()

p1 = Process(target=print_content, args=("Procees-1",processedFiles,ListOutput,lock))
p2 = Process(target=print_content, args=("Process-2",processedFiles,ListOutput,lock))



print "ListOutput",ListOutput

print "Error: unable to start process"

I have 4 input files called
(contains "my car"),
(contains "your car"),
(contains "my book"),
(contains "your book").
The output that it shows me changes in different runs. This is the output in one of the runs:

Current Process: Procees-1

Current Process: Process-2

1.txt not in [] for Procees-1

Current Process: Procees-1

2.txt not in
Current Process: Process-2
['1.txt'] for Procees-1

2.txt not in ['1.txt', '2.txt'] for Process-2

Current Process: Procees-1

3.txt not in ['1.txt', '2.txt', '2.txt'] for Procees-1

Current Process: Process-2

Current Process: Process-2

4.txt not in
Current Process: Procees-1
['1.txt', '2.txt', '2.txt', '3.txt'] for Process-2

4.txt not in ['1.txt', '2.txt', '2.txt', '3.txt', '4.txt'] for Procees-1
ListOutput [['my car'], ['your car'], ['your car'], ['my book'], ['your book'], ['your book']]

Answer Source

Bingo! Thanks for including the imports. Now the problem is obvious ;-)

You need to use a multiprocessing.Lock to get a lock that works across processes. The Lock you're actually using is implicitly obtained along with a mountain of other stuff via your

from threading import *

A threading.Lock is useless for your purpose: it has no effect whatsoever across processes; it only provides exclusion among threads within a single process.

It's usually a Bad Idea to use import *, and this is one reason why. Even if you changed your second import to

from multiprocessing import Process, Manager, Lock

it wouldn't do you any good, because from threading import * would overwrite the Lock you really want.