René René - 4 months ago 15
Python Question

urlopen/requests.get not working in threads created in imported modules

I have a problem with urlopen
(and requests.get)

In my program, if I run it inside a thread (I tested with

multiprocessing
too) [update: a thread that has been created by an imported module] it won't run until the program ends.

By "won't run" I mean not even start: the timeout (here 3 seconds) will never fire, and there is no connection made to the website.

Here is my simplified code:

import threading,urllib2,time

def dlfile(url):
print 'Before request'
r = urllib2.urlopen(url, timeout=3)
print 'After request'
return r

def dlfiles(*urls):
threads = [threading.Thread(None, dlfile, None, (url,), {}) for url in urls]
map(lambda t:t.start(), threads)

def main():
dlfiles('http://google.com')

main()
time.sleep(10)
print 'End of program'


My output:

Before request
End of program
After request


Unfortunately, the code I'm writing on SO works as expected (i.e. "Before request/After request/End of program") and I can't reproduce the problem with simplified code yet.

I'm still trying to but in the mean time I'd like to know if anyone ever encountered that weird behaviour and what could cause it. Note that if I don't use a thread everything's fine.


Thanks for any help you can provide, I'm kind of lost and even the interwebs have no idea about this

UPDATE



Here is how to reproduce the behaviour

threadtest.py

import threading,urllib2,time
def log(a):print(a)
def dlfile(url):
log('Before request')
r = urllib2.urlopen(url, timeout=3)
log('After request')
return r

def dlfiles(*urls):
threads = [threading.Thread(None, dlfile, None, (url,), {}) for url in urls]
map(lambda t:t.start(), threads)

def main():
dlfiles('http://google.com')

main()
for i in range(5):
time.sleep(1)
log('Sleep')
log('End of program')


threadtest-import.py

import threadtest


Then the outputs will be this:

$ python threadtest.py
Before request
After request
Sleep
Sleep
Sleep
Sleep
Sleep
End of program

$ python threadtest-import.py
Before request
Sleep
Sleep
Sleep
Sleep
Sleep
End of program
After request


Now that I found how to reproduce: is this behaviour normal? expected?

And how can I get rid of it? I.e. creating from an imported module a thread that can make urlopen load as expected.

Answer

I forgot to post the solution, thanks to @user3351750 for his comment.

The problem is the structure of the files. In threadtest-import.py I import threadtest and during the time the module is imported, something* (I don't remember the exact mechanism) becomes blocking. IIRC this has to do with the re module in urllib. Sorry for not being clear.

The fix is to put your code in the imported module inside a function. This is good practice for a reason I guess.

I.e. do this:

import threadtest #do nothing except declarations
threadtest.run() #do the work

Instead of this:

import threadtest #declarations + work

And put the code

main()
for i in range(5):
    time.sleep(1)
    log('Sleep')
log('End of program')

Inside the run function:

def run():
    main()
    for i in range(5):
        time.sleep(1)
        log('Sleep')
    log('End of program')

This way the thing* stops being blocking and everything works as expected.

Comments