cybertextron cybertextron - 4 years ago 267
Scala Question

Connecting to a remote Spark master - Java / Scala

I created a 3 node (1 master, 2 workers)

Apache Spark
cluster in AWS. I'm able to submit jobs to the cluster from the master, however I cannot get it work remotely.

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
def main(args: Array[String]) {
val logFile = "/usr/local/spark/" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")

I can see from the master:

Spark Master at spark://ip-171-13-22-125.ec2.internal:7077
URL: spark://ip-171-13-22-125.ec2.internal:7077
REST URL: spark://ip-171-13-22-125.ec2.internal:6066 (cluster mode)

so when I execute
from my local machine, it fails to connect to the the
Spark Master

2017-02-04 19:59:44,074 INFO [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:54) [] - Connecting to master spark://
2017-02-04 19:59:44,166 WARN [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:87) [] - Failed to connect to spark://
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.10-2.0.2.jar:2.0.2]
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.10-2.0.2.jar:2.0.2]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ~[scala-library-2.10.0.jar:?]
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.10-2.0.2.jar:2.0.2]

However, I know it would have worked if I had set the master to
, because then it would run locally. However, I want to have my client connecting to this remote master. How can I accomplish that? The Apache configuration looks file. I can even telnet to that public DNS and port, I also configured
with the public DNS and hostname for each of the
I want to be able to submit jobs to this remote master, what am I missing?

Answer Source

For binding master host-name/IP go to your spark installation conf directory (spark-2.0.2-bin-hadoop2.7/conf) and create file using below command.


Open file in vi editor and add below line with host-name/IP of your master.

Stop and start Spark using and Now you can use it to connect remote master using

val spark = SparkSession.builder()

For more information on setting environment variables please check

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download