user3406647 user3406647 - 22 days ago 6
Python Question

Python 3.5 global variable won't append

I am trying to use a global list which can be appended when a thread/process finishes a task. My main thread can read from this but by function can not append it. Basically im making requests to get working proxies and then trying to save them to the list and then print the list out at the end. I have cut out as much as possible.

goodProxyList = ["test"]


def testProxy(x):
global goodProxyList
try:
test = requests.get('http://someurl.com/', proxies=proxies, timeout=10)
if test.status_code == 200:
goodProxyList.append(x)
else:
print("Something went wrong! :/" + " From PID: " + str(pid))
except:
print("SOMETHING WENT VERY WRONG" + " From PID: " + str(pid))


if __name__ == '__main__':
##Setup Stuff happens
p=Pool(2)
p.map(testProxy, proxyList)
for i in goodProxyList:
print(i)


Even if I change goodProxyList.append(x) to goodProxyList.append("Anything"), the last 2 lines still onlt output "test". What am I doing wrong?

EDIT:

I have found the answer through help from brianpck. As he says, it seems processes work differently from threads. My changing to a pool thread it now works perfectly.

#p=Pool(2)
#p.map(testProxy, proxyList)
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(testProxy, proxyList)

Answer

The issue here is with Pool, not with global.

When appending to a list (a mutable object) in function scope, the list will be mutated in the global scope as well. (In fact, you don't even have to use the global keyword: if the function doesn't find the variable in its own scope, it will automatically look in the global scope.) Note one small "gotcha" in the below code, because map is a generator-like object:

x = []

def add_to_x(i):
    x.append(i)

if __name__ == '__main__':
    y = map(add_to_x, [1, 2])
    print(x) # still []
    list(y)
    print(x) # now [1, 2]

The following simple example with Pool does not work though:

from multiprocessing import Pool

x = []

def add_to_x(i):
    x.append(i)

if __name__ == '__main__':
    p = Pool(2)
    list(p.map(add_to_x, [1, 2]))
    print(x) # prints [] !

Why? The answer to Python multiprocessing global variable updates not returned to parent is illuminative: here is the relevant part:

When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.

You could potentially deal with this in many ways. One way would be to change testProxy to is_good_proxy, which will return a boolean. You could then apply the appending logic in the main loop.

Comments