lserlohn lserlohn - 1 year ago 97
Scala Question

How to debug a scala based Spark program on Intellij IDEA

I am currently building my development IDE using Intellij IDEA. I followed exactly the same way as

build.sbt file

name := "Simple Project"

version := "1.0"

scalaVersion := "2.11.7"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"

Sample Program File

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object MySpark {

def main(args: Array[String]){
val logFile = "/IdeaProjects/hello/testfile.txt"
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

If I use command line:

sbt package

and then

spark-submit --class "MySpark" --master local[4] target/scala-2.11/myspark_2.11-1.0.jar

I am able to generate jar package and spark runs well.

However, I want to use Intellij IDEA to debug the program in the IDE. How can I setup the configuration, so that if I click "debug", it will automatically generate the jar package and automatically launch the task by executing "spark-submit-" command line.

I just want everything could be simple as "one click" on the debug button in Intellij IDEA.


Answer Source

you can simple add below spark options

export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7777 

And create the Debug configuration as follows

Rub-> Edit Configuration -> Click on "+" left top cornor -> Remote -> set port and name

After above configuration run spark application with spark-submit or sbt run and then run debug which is created in configuration. and add checkpoints for debug.