Dumitru Dumitru - 25 days ago 21
Python Question

How is this a coroutine?

I'm trying to understand the coroutines in Python (and in general). Been reading about the theory, the concept and a few examples, but I'm still struggling. I understand the asynchronous model (did a bit of Twisted) but not coroutines yet.

One tutorial gives this as a coroutine example (I made a few changes to illustrate my problem):

async def download_coroutine(url, number):
"""
A coroutine to download the specified url
"""
request = urllib.request.urlopen(url)
filename = os.path.basename(url)
print("Downloading %s" % url)

with open(filename, 'wb') as file_handle:
while True:
print(number) # prints numbers to view progress
chunk = request.read(1024)
if not chunk:
print("Finished")
break
file_handle.write(chunk)
msg = 'Finished downloading {filename}'.format(filename=filename)
return msg


This is run with this

coroutines = [download_coroutine(url, number) for number, url in enumerate(urls)]
completed, pending = await asyncio.wait(coroutines)


Looking at generator coroutines examples I can see a few
yield
statements. There's nothing here, and urllib is synchronous, AFAIK.

Also, since the code is supposed to be asynchronous, I am expecting to see a series of interleaved numbers. (1, 4, 5, 1, 2, ..., "Finished", ...) . What I'm seeing is a single number repeating ending in a
Finished
and then another one (3, 3, 3, 3, ... "Finished", 1, 1, 1, 1, ..., "Finished" ...).

At this point I'm tempted to say the tutorial is wrong, and this is a coroutine just because is has async in front.

Answer

The co in coroutine stands for cooperative. Yielding (to other routines) makes a routine a co-routine, really, because only by yielding when waiting can other co-routines be interleaved. In the new async world of Python 3.5 and up, that usually is achieved by await-ing results from other coroutines.

By that definition, the code you found is not a coroutine. As far as Python is concerned, it is a coroutine object, because that's the type given to a function object created using async def.

So yes, the tutorial is.. unhelpful, in that they used entirely synchronous, uncooperative code inside a coroutine function.

Instead of urllib, an asynchronous HTTP library would be needed. Like aiohttp:

import aiohttp

async def download_coroutine(url):
    """
    A coroutine to download the specified url
    """
    filename = os.path.basename(url)
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            with open(filename, 'wb') as fd:
                while True:
                    chunk = await resp.content.read(1024)
                    if not chunk:
                        break
                    fd.write(chunk)
    msg = 'Finished downloading {filename}'.format(filename=filename)
    return msg

This coroutine can yield to other routines when waiting for a connection to be established, and when waiting for more network data, as well as when closing the session again.

We could further make the file writing asynchronous, but that has portability issues; the aiofiles project library uses threads to off-load the blocking calls to. Using that library, the code would need updating to:

import aiofiles

async with aiofiles.open(filename, 'wb') as fd:
    while True:
        chunk = await resp.content.read(1024)
        if not chunk:
            break
        await fd.write(chunk)

Note: the blog post has since been updated to fix these issues.

Comments