kramer65 kramer65 - 5 months ago 70
Python Question

How to run recurring task in the Python Flask framework?

I'm building a website which provides some information to the visitors. This information is aggregated in the background by polling a couple external APIs every 5 seconds. The way I have it working now is that I use APScheduler jobs. I initially preferred APScheduler because it makes the whole system more easy to port (since I don't need to set cron jobs on the new machine). I start the polling functions as follows:

from apscheduler.scheduler import Scheduler

@app.before_first_request
def initialize():
apsched = Scheduler()
apsched.start()

apsched.add_interval_job(checkFirstAPI, seconds=5)
apsched.add_interval_job(checkSecondAPI, seconds=5)
apsched.add_interval_job(checkThirdAPI, seconds=5)


This kinda works, but there's some trouble with it:


  1. For starters, this means that the interval-jobs are running outside of the Flask context. So far this hasn't been much of a problem, but when calling an endpoint fails I want the system to send me an email (saying "hey calling API X failed"). Because it doesn't run within the Flask context however, it complaints that flask-mail cannot be executed (
    RuntimeError('working outside of application context')
    ).

  2. Secondly, I wonder how this is going to behave when I don't use the Flask built-in debug server anymore, but a production server with lets say 4 workers. Will it start every job four times then?



All in all I feel that there should be a better way of running these recurring tasks, but I'm unsure how. Does anybody out there have an interesting solution to this problem? All tips are welcome!

[EDIT]
I've just been reading about Celery with its schedules. Although I don't really see how Celery is different from APScheduler and whether it could thus solve my two points, I wonder if anyone reading this thinks that I should investigate more in Celery?

[CONCLUSION]
About two years later I'm reading this, and I thought I could let you guys know what I ended up with. I figured that @BluePeppers was right in saying that I shouldn't be tied so closely to the Flask ecosystem. So I opted for regular cron-jobs running every minute which are set using Ansible. Although this makes it a bit more complex (I needed to learn Ansible and convert some code so that running it every minute would be enough) I think this is more robust.
I'm currently using the awesome pythonr-rq for queueing a-sync jobs (checking APIs and sending emails). I just found out about rq-scheduler. I haven't tested it yet, but it seems to do precisely what I needed in the first place. So maybe this is a tip for future readers of this question.

For the rest, I just wish all of you a beautiful day!

Answer

(1)

You can use the app.app_context() context manager to set the application context. I imagine usage would go something like this:

from apscheduler.scheduler import Scheduler

def checkSecondApi():
    with app.app_context():
        # Do whatever you were doing to check the second API

@app.before_first_request
def initialize():
    apsched = Scheduler()
    apsched.start()

    apsched.add_interval_job(checkFirstAPI, seconds=5)
    apsched.add_interval_job(checkSecondAPI, seconds=5)
    apsched.add_interval_job(checkThirdAPI, seconds=5)

Alternatively, you could use a decorator

def with_application_context(app):
    def inner(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            with app.app_context():
                return func(*args, **kwargs)
        return wrapper
    return inner

@with_application_context(app)
def checkFirstAPI():
    # Check the first API as before

(2)

Yes it will still work. The sole (significant) difference is that your application will not be communicating directly with the world; it will be going through a reverse proxy or something via fastcgi/uwsgi/whatever. The only concern is that if you have multiple instances of the app starting, then multiple schedulers will be created. To manage this, I would suggest you move your backend tasks out of the Flask application, and use a tool designed for running tasks regularly (i.e. Celery). The downside to this is that you won't be able to use things like Flask-Mail, but imo, it's not too good to be so closely tied to the Flask ecosystem; what are you gaining by using Flask-Mail over a standard, non Flask, mail library?

Also, breaking up your application makes it much easier to scale up individual components as the capacity is required, compared to having one monolithic web application.

Comments