tom tom - 4 months ago 35
Python Question

Avoiding or handling "BadRequestError: The requested query has expired."?

I'm looping over data in app engine using chained deferred tasks and query cursors. Python 2.7, using db (not ndb). E.g.

def loop_assets(cursor = None):

try:

assets = models.Asset.all().order('-size')

if cursor:
assets.with_cursor(cursor)

for asset in assets.run():

if asset.is_special():
asset.yay = True
asset.put()

except db.Timeout:
cursor = assets.cursor()
deferred.defer(loop_assets, cursor = cursor, _countdown = 3, _target = version, _retry_options = dont_retry)
return


This ran for ~75 minutes total (each task for ~ 1 minute), then raised this exception:

BadRequestError: The requested query has expired. Please restart it with the last cursor to read more results.


Reading the docs, the only stated cause of this is:


New App Engine releases may change internal implementation details, invalidating cursors that depend on them. If an application attempts to use a cursor that is no longer valid, the Datastore raises a BadRequestError exception.


So maybe that's what happened, but it seems a co-incidence that the first time I ever try this technique I hit a 'change in internal implementation' (unless they happen often).

Is there another explanation for this?
Is there a way to re-architect my code to avoid this?

If not, I think the only solution is to mark which assets have been processed, then add an extra filter to the query to exclude those, and then manually restart the process each time it dies.

For reference, this question asked something similar, but the accepted answer is 'use cursors', which I am already doing, so it cant be the same issue.

tom tom
Answer

When I asked this question, I had run the code once, and experienced the BadRequestError once. I then ran it again, and it completed without a BadRequestError, running for ~6 hours in total. So at this point I would say that the best 'solution' to this problem is to make the code idempotent (so that it can be retried) and then add some code to auto-retry.

In my specific case, it was also possible to tweak the query so that in the case that the cursor 'expires', the query can restart w/o a cursor where it left off. Effectively change the query to:

assets = models.Asset.all().order('-size').filter('size <', last_seen_size)

Where last_seen_size is a value passed from each task to the next.

Comments