KART KART - 21 days ago 5
Python Question

Limit Python Threads : Resource Temporarily Unavailable

I use python threading.Thread to spawn threads that execute a small utility for every filename found in os.walk() and get its output. I tried limiting number of threads using:

ThreadLimiter = threading.BoundedSemaphore(3)



in start of run method and


at end of run method

But I still get the below error message when I run the python program. Any suggestions on improving this ?

bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable


Use a thread pool and save yourself a lot of work! Here I md5sum files:

import os
import multiprocessing.pool
import subprocess as subp

def walker(path):
    """Walk the file system returning file names"""
    for dirpath, dirs, files in os.walk(path):
        for fn in files:
            yield os.path.join(dirpath, fn)

def worker(filename):
    """get md5 sum of file"""
    p = subp.Popen(['md5sum', filename], stdin=subp.PIPE,
            stdout=subp.PIPE, stderr=subp.PIPE)
    out, err = p.communicate()
    return filename, p.returncode, out, err

pool = multiprocessing.pool.ThreadPool(3)

for filename, returncode, out, err in pool.imap(worker, walker('.'), chunksize=1):
    print(filename, out.strip())