duckertito duckertito - 1 year ago 147
Python Question

How to schedule the execution of spark-submit to specific time

I have a Spark batch processing code (basically, the model training) that I execute with

from AWS EMR cluster. Now I want to be able to launch this job each day at specific time.
What is the standard way to do it?
Should I change the code and add the scheduling inside the code? Or is there any way to schedule spark-submit job?
Or maybe should I make it as a Spark Streaming job executed every 24 hours? (though I am interested in a specific time slot, i.e. between 11:00pm and 12pm)

Answer Source

If you are using Linux you can setup a Cron job to call the spark-submit script

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download