O. Watson O. Watson - 1 month ago 6
Python Question

Easiest way to impliment multithread in this function [Python]

So I have data known as

id_list
that is coming into the function in this format
[(u'SGP-3630', 1202), (u'MTSCR-534', 1244)]
. The format being two values paired together, there could be 1 pair or a hundred pairs.

This is the function:

def ListParser(id_list):
list_length = len(id_list)
count = 0
table = ""

while count < list_length:
jira = id_list[count][0]
stash = id_list[count][1]
count = count + 1
table = table + RetrieveFromAPI(stash, jira)

table = TableFormatter(table)
table = TableColouriser(table)

return table


What this function does is goes through the list and extracts the pairs and puts them through a function called
RetrieveFromAPI()
which fetches information from a URL.

Anyone have an idea on how to impliment multithreading here? I've had a shot at splitting both lists up into their own lists and getting the pool to iterate through each list but it hasn't quite worked.

def ListParser(id_list):
pool = ThreadPool(4)
list_length = len(id_list)
count = 0
table = ""
jira_list = list()
stash_list = list()

while count < list_length:
jira_list = jira_list.extend(id_list[count][0])
print jira_list
stash_list = stash_list.extend(id_list[count][1])
print stash_list
count = count + 1

table = table + pool.map(RetrieveFromAPI, stash_list, jira_list)
table = TableFormatter(table)
table = TableColouriser(table)

return table


The error I'm getting for this attempt is
TypeError: 'int' object is not iterable


EDIT 2: Okay so I've managed to get the first list with tuples split up into two different lists, but I'm unsure how to get multithreading working with it.

jira,stash= map(list,zip(*id_list))

Answer

You're working too hard! From help(multiprocessing.pool.ThreadPool)

map(self, func, iterable, chunksize=None)
    Apply `func` to each element in `iterable`, collecting the results
    in a list that is returned.

The second argument is an iterable of the arguments you want to pass to the worker threads. You have a list of lists and you want the first two items from the inner list for each call. id_list is already iterable, so we're close. A small function (in this case implemented as a lambda) bridges the gap.

I worked up a full mock solution just to make sure it works, so here it goes. As an aside, you can benefit from a fairly large pool size since they spend much of their time waiting on I/O.

from multiprocessing.pool import ThreadPool

def RetrieveFromAPI(stash, jira):
    # boring mock of api
    return '{}-{}.'.format(stash, jira)

def TableFormatter(table):
    # mock
    return table

def TableColouriser(table):
    # mock
    return table

def ListParser(id_list):
    pool = ThreadPool(min(12, len(id_list)))
    table = ''.join(pool.map(lambda item: RetrieveFromAPI(item[1], item[0]),
        id_list, chunksize=1))
    pool.close()
    pool.join()
    table = TableFormatter(table)
    table = TableColouriser(table)
    return table

id_list = [[0,1,'foo'], [2,3,'bar'], [4,5, 'baz']]

print(ListParser(id_list))