I am trying to use Scala spark within eclipse to obtain data from MySQL database.
The problem is that the code is taking hours juste to execute one SQL query.
This is my initial code:
val conf = new SparkConf().setAppName("MyApp").setMaster("local")
val sc = new SparkContext(conf)
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val action = sqlcontext.jdbc(jdbcUrl, "action").registerTempTable("action")
val session = sqlcontext.jdbc(jdbcUrl, "session").registerTempTable("session")
val data = sqlcontext.sql('SELECT * FROM action INNER JOIN session ON action.session_id = session.session_id")
val df = sqlcontext.table("action").collect()
The are multiple reasons for long running job. As you mentioned your master is "local" you are running on a single executor thread. Spark will do better when it is partitioned well. please check how many partitions are created in your case. if it is one please do re-partition by using repartition(numberofpartitions : int) and run with more threads to achieve parallel processing(local/local[*]).