pyInTheSky pyInTheSky - 1 year ago 67
Python Question

SQLAlchemy poor performance on iterating

The following block of code queries a table with ~2000 rows. The loop takes 20s to execute! From my limited knowledge, I don't think I'm doing 2000 queries, just the one, but perhaps I'm not understanding the '.' operator and what it's doing behind the scenes. How can I fix this loop to run more quickly? Is there a way to adjust the top level query s.t. the 2nd for loop is not making a total of 3000 queries (if that's in fact what's going on)?

Here is a test block of code I made to verify that it was in fact this inner loop that was causing the massive time consumption.

block = []
cnt = 0
for blah in dbsession.query(mytable):
tic = time.time()
for a in blah.component:
cnt += 1
print "block: %s seconds cnt: %s" % (sum(block), cnt)

# block: 20.78191 seconds cnt: 3021

Using the suggestion from the selected best-answer, the for loop became:

for blah in dbsession.query(mytable).options( joinedload(mytable.componentA)).options(mytable.componentB)).options(mytable.componentC)

which resulted in the inner loops of each component going from 20-25s each, to 0.25 0.59 and 0.11s respectively. The query itself now takes about 18s ... so my total saved time is about 55s.

Answer Source

Each time you access .component another SQL query is emitted.

You can read more at Relationship Loading Techniques, but to load it all at once you can change your query to the following:

from sqlalchemy.orm import joinedload