I'm trying to understand the coroutines in Python (and in general). Been reading about the theory, the concept and a few examples, but I'm still struggling. I understand the asynchronous model (did a bit of Twisted) but not coroutines yet.
One tutorial gives this as a coroutine example (I made a few changes to illustrate my problem):
async def download_coroutine(url, number):
"""
A coroutine to download the specified url
"""
request = urllib.request.urlopen(url)
filename = os.path.basename(url)
print("Downloading %s" % url)
with open(filename, 'wb') as file_handle:
while True:
print(number) # prints numbers to view progress
chunk = request.read(1024)
if not chunk:
print("Finished")
break
file_handle.write(chunk)
msg = 'Finished downloading {filename}'.format(filename=filename)
return msg
coroutines = [download_coroutine(url, number) for number, url in enumerate(urls)]
completed, pending = await asyncio.wait(coroutines)
yield
Finished
The co in coroutine stands for cooperative. Yielding (to other routines) makes a routine a co-routine, really, because only by yielding when waiting can other co-routines be interleaved. In the new async
world of Python 3.5 and up, that usually is achieved by await
-ing results from other coroutines.
By that definition, the code you found is not a coroutine. As far as Python is concerned, it is a coroutine object, because that's the type given to a function object created using async def
.
So yes, the tutorial is.. unhelpful, in that they used entirely synchronous, uncooperative code inside a coroutine function.
Instead of urllib
, an asynchronous HTTP library would be needed. Like aiohttp
:
import aiohttp
async def download_coroutine(url):
"""
A coroutine to download the specified url
"""
filename = os.path.basename(url)
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
with open(filename, 'wb') as fd:
while True:
chunk = await resp.content.read(1024)
if not chunk:
break
fd.write(chunk)
msg = 'Finished downloading {filename}'.format(filename=filename)
return msg
This coroutine can yield to other routines when waiting for a connection to be established, and when waiting for more network data, as well as when closing the session again.
We could further make the file writing asynchronous, but that has portability issues; the aiofiles
project library uses threads to off-load the blocking calls to. Using that library, the code would need updating to:
import aiofiles
async with aiofiles.open(filename, 'wb') as fd:
while True:
chunk = await resp.content.read(1024)
if not chunk:
break
await fd.write(chunk)
Note: the blog post has since been updated to fix these issues.