duckertito duckertito - 25 days ago 25
Python Question

How to schedule the execution of spark-submit to specific time

I have a Spark batch processing code (basically, the model training) that I execute with

spark-submit
from AWS EMR cluster. Now I want to be able to launch this job each day at specific time.
What is the standard way to do it?
Should I change the code and add the scheduling inside the code? Or is there any way to schedule spark-submit job?
Or maybe should I make it as a Spark Streaming job executed every 24 hours? (though I am interested in a specific time slot, i.e. between 11:00pm and 12pm)

Answer

If you are using Linux you can setup a Cron job to call the spark-submit script http://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/