I'm looking for information on thread safety of urllib2 and httplib.
Official documentation (http://docs.python.org/library/urllib2.html and http://docs.python.org/library/httplib.html) lacks any information on this subject; the word thread is not even mentioned there...
Ok, they are not thread-safe out of the box.
What's required to make them thread-safe or is there a scenario in which they can be thread-safe?
I'm asking because it's seems that
urllib2 are not thread-safe.
urllib2 does not provide serialized access to a global (shared)
OpenerDirector object, which is used by
httplib does not provide serialized access to
HTTPConnection objects (i.e. by using a thread-safe connection pool), so sharing
HTTPConnection objects between threads is not safe.
Generally, if a module's documentation does not mention thread-safety, I would assume it is not thread-safe. You can look at the module's source code for verification.
When browsing the source code to determine whether a module is thread-safe, you
can start by looking for uses of thread synchronization primitives from the
multiprocessing modules, or use of
Here is a relevant source code snippet from
urllib2.py (Python 2.7.2):
_opener = None def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT): global _opener if _opener is None: _opener = build_opener() return _opener.open(url, data, timeout) def install_opener(opener): global _opener _opener = opener
There is an obvious race condition when concurrent threads call
Also, note that calling
urlopen() with a
Request object as the
url parameter may mutate the
Request object (see the source for
OpenerDirector.open()), so it is not safe to concurrently call
urlopen() with a shared
urlopen() is thread-safe if the following conditions are met:
install_opener()is not called from another thread.
Requestobject, or string is used as the