Nipun Nipun - 1 month ago 17
Scala Question

Unable to log spark job output

I have created a small program in scala to be run in spark environment. I have a standalone cluster configuration. I submitted the job and the job run successfully on the worker machine. I can see the result in my console, but when I open the browser to see the worker logs, it does not print anything in the stdout. Only stderr has some logs. I am using println to print to print in the program. Am I missing something

Here is the program

object SimpleJob {
def main(args: Array[String]) {
val logFile = "/var/log/syslog" // Should be some file on your system
val conf = new SparkConf().setAppName("Spark log file reader");
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}


Updates
Here is my log.properties file.

# Set everything to be logged to the console
log4j.rootCategory=INFO, stdout
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Answer

I think the link I posted in the comments for zero323's answer has the answer for you. Basically, the worker output will NOT have the output, but instead the driver will contain that logging. Everything in the main program, NOT in a closure will be run through the driver.

Here is a sample pseudo-program with comments of where logging will end up:

object SimpleJob {
  def main(args: Array[String]) {
    ...
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => {
      println("Closure Stuff")//Displayed/Logged in the worker
      line.contains("a")
    }).count()
    println("Stuff")//Displayed/Logged in the driver
  }
}

Also, per the link provided, I am fairly positive that stderr displays the log4j output, whereas stdout displays the println output

Comments