haeney haeney - 1 year ago 76
Python Question

How can I prevent values from overlapping in a Python multiprocessing?

I'm trying Python multiprocessing, and I want to use Lock to avoid overlapping variable 'es_id' values.

According to theory and examples, when a process calls lock, 'es_id' can't overlap because another process can't access it, but, the results show that es_id often overlaps.

How can the id values not overlap?

Part of my code is:

def saveDB(imgName, imgType, imgStar, imgPull, imgTag, lock): #lock=Lock() in main
imgName=NameFormat(imgName) #name/subname > name:subname
i=0
while i < len(imgName):
lock.acquire() #since global es_id
global es_id

print "getIMG.pt:save information about %s"%(imgName[i])
cmd="curl -XPUT http://localhost:9200/kimhk/imgName/"+str(es_id)+" -d '{" +\
'"image_name":"'+imgName[i]+'", '+\
'"image_type":"'+imgType[i]+'", '+\
'"image_star":"'+imgStar[i]+'", '+\
'"image_pull":"'+imgPull[i]+'", '+\
'"image_Tag":"'+",".join(imgTag[i])+'"'+\
"}'"
try:
subprocess.call(cmd,shell=True)
except subprocess.CalledProcessError as e:
print e.output
i+=1
es_id+=1
lock.release()


...

#main
if __name__ == "__main__":
lock = Lock()
exPg, proc_num=option()

procs=[]
pages=[ [] for i in range(proc_num)]
i=1

#Use Multiprocessing to get HTML data quickly
if proc_num >= exPg: #if page is less than proc_num, don't need to distribute the page to the process.
while i<=exPg:
page=i
proc=Process(target=getExplore, args=(page,lock,))
procs.append(proc)
proc.start()
i+=1
else:
while i<=exPg: #distribute the page to the process
page=i
index=(i-1)%proc_num #if proc_num=4 -> 0 1 2 3
pages[index].append(page)
i+=1
i=0
while i<proc_num:
proc=Process(target=getExplore, args=(pages[i],lock,))#
procs.append(proc)
proc.start()
i+=1

for proc in procs:
proc.join()


execution result screen:

image showing duplicate eids

result is the output of subprocess.call (cmd, shell = True). I use XPUT to add data to ElasticSearch, and es_id is the id of the data. I want these id to increase sequentially without overlap. (Because they will be overwritten by the previous data if they overlap)

I know XPOST doesn't need to use a lock code because it automatically generates an ID, but I need to access all the data sequentially in the future (like reading one line of files).

If you know how to access all the data sequentially after using XPOST, can you tell me?

Answer Source

It looks like you are trying to access a global variable with a lock, but global variables are different instances between processes. What you need to use is a shared memory value. Here's a working example. It has been tested on Python 2.7 and 3.6:

from __future__ import print_function
import multiprocessing as mp

def process(counter):
    # Increment the counter 3 times.
    # Hold the counter's lock for read/modify/write operations.
    # Keep holding it so the value doesn't change before printing,
    # and keep prints from multiple processes from trying to write
    # to a line at the same time.
    for _ in range(3):
        with counter.get_lock():
            counter.value += 1
            print(mp.current_process().name,counter.value)

def main():
    counter = mp.Value('i') # shared integer
    processes = [mp.Process(target=process,args=(counter,)) for i in range(3)]
    for p in processes:
        p.start()
    for p in processes:
        p.join()

if __name__ == '__main__':
    main()

Output:

Process-2 1
Process-2 2
Process-1 3
Process-3 4
Process-2 5
Process-1 6
Process-3 7
Process-1 8
Process-3 9
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download