Nipun Nipun - 11 months ago 76
Scala Question

Unable to log spark job output

I have created a small program in scala to be run in spark environment. I have a standalone cluster configuration. I submitted the job and the job run successfully on the worker machine. I can see the result in my console, but when I open the browser to see the worker logs, it does not print anything in the stdout. Only stderr has some logs. I am using println to print to print in the program. Am I missing something

Here is the program

object SimpleJob {
def main(args: Array[String]) {
val logFile = "/var/log/syslog" // Should be some file on your system
val conf = new SparkConf().setAppName("Spark log file reader");
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

Here is my file.

# Set everything to be logged to the console
log4j.rootCategory=INFO, stdout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose$exprTyper=INFO$SparkILoopInterpreter=INFO

Answer Source

I think the link I posted in the comments for zero323's answer has the answer for you. Basically, the worker output will NOT have the output, but instead the driver will contain that logging. Everything in the main program, NOT in a closure will be run through the driver.

Here is a sample pseudo-program with comments of where logging will end up:

object SimpleJob {
  def main(args: Array[String]) {
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => {
      println("Closure Stuff")//Displayed/Logged in the worker
    println("Stuff")//Displayed/Logged in the driver

Also, per the link provided, I am fairly positive that stderr displays the log4j output, whereas stdout displays the println output