kurtgn kurtgn - 2 months ago 29
Python Question

peewee and peewee-async: why is async slower

I am trying to wrap my head around Tornado and async connections to Postgresql. I found a library that can do this at http://peewee-async.readthedocs.io/en/latest/.

I devised a little test to compare traditional Peewee and Peewee-async, but somehow async works slower.

This is my app:

import peewee
import tornado.web
import logging
import asyncio
import peewee_async
import tornado.gen
import tornado.httpclient
from tornado.platform.asyncio import AsyncIOMainLoop

AsyncIOMainLoop().install()
app = tornado.web.Application(debug=True)
app.listen(port=8888)

# ===========
# Defining Async model
async_db = peewee_async.PooledPostgresqlDatabase(
'reminderbot',
user='reminderbot',
password='reminderbot',
host='localhost'
)
app.objects = peewee_async.Manager(async_db)
class AsyncHuman(peewee.Model):
first_name = peewee.CharField()
messenger_id = peewee.CharField()
class Meta:
database = async_db
db_table = 'chats_human'


# ==========
# Defining Sync model
sync_db = peewee.PostgresqlDatabase(
'reminderbot',
user='reminderbot',
password='reminderbot',
host='localhost'
)
class SyncHuman(peewee.Model):
first_name = peewee.CharField()
messenger_id = peewee.CharField()
class Meta:
database = sync_db
db_table = 'chats_human'

# defining two handlers - async and sync
class AsyncHandler(tornado.web.RequestHandler):

async def get(self):
"""
An asynchronous way to create an object and return its ID
"""
obj = await self.application.objects.create(
AsyncHuman, messenger_id='12345')
self.write(
{'id': obj.id,
'messenger_id': obj.messenger_id}
)


class SyncHandler(tornado.web.RequestHandler):

def get(self):
"""
An traditional synchronous way
"""
obj = SyncHuman.create(messenger_id='12345')
self.write({
'id': obj.id,
'messenger_id': obj.messenger_id
})


app.add_handlers('', [
(r"/receive_async", AsyncHandler),
(r"/receive_sync", SyncHandler),
])

# Run loop
loop = asyncio.get_event_loop()
try:
loop.run_forever()
except KeyboardInterrupt:
print(" server stopped")


and this is what I get from Apache Benchmark:

ab -n 100 -c 100 http://127.0.0.1:8888/receive_async

Connection Times (ms)
min mean[+/-sd] median max
Connect: 2 4 1.5 5 7
Processing: 621 1049 256.6 1054 1486
Waiting: 621 1048 256.6 1053 1485
Total: 628 1053 255.3 1058 1492

Percentage of the requests served within a certain time (ms)
50% 1058
66% 1196
75% 1274
80% 1324
90% 1409
95% 1452
98% 1485
99% 1492
100% 1492 (longest request)




ab -n 100 -c 100 http://127.0.0.1:8888/receive_sync
Connection Times (ms)
min mean[+/-sd] median max
Connect: 2 5 1.9 5 8
Processing: 8 476 277.7 479 1052
Waiting: 7 476 277.7 478 1052
Total: 15 481 276.2 483 1060

Percentage of the requests served within a certain time (ms)
50% 483
66% 629
75% 714
80% 759
90% 853
95% 899
98% 1051
99% 1060
100% 1060 (longest request)


why is sync faster? where is the bottleneck I'm missing?

Answer

For a long explanation:

http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/

For a short explanation: synchronous Python code is simple and mostly implemented in the standard library's socket module, which is pure C. Async Python code is more complex than synchronous code. Each request requires several executions of the main event loop code, which is written in Python (in the asyncio case here) and therefore has a lot of overhead compared to C code.

Benchmarks like yours show async's overhead dramatically, because there's no network latency between your application and your database, and you're doing a large number of very small database operations. Since every other aspect of the benchmark is fast, these many executions of the event loop logic add a large proportion of the total runtime.

Mike Bayer's argument, linked above, is that low-latency scenarios like this are typical for database applications, and therefore database operations shouldn't be run on the event loop.

Async is best for high-latency scenarios, like websockets and web crawlers, where the application spends most of its time waiting for the peer, rather than spending most of its time executing Python.

In conclusion: if your application has a good reason to be async (it deals with slow peers), having an async database driver is a good idea for the sake of consistent code, but expect some overhead.

If you don't need async for another reason, don't do async database calls, because they're a bit slower.