Woo Woo - 2 months ago 24
Java Question

spark remote develop environment

i want to remote spark develop environment.

A machine is my develop machine, java, eclipse, windows 10.

also i have another machine already installed cloduera(spark on yarn).

i tried this

String appName = "test" + new Date(System.currentTimeMillis());
String master = "spark://*:6066";
String host = "*";
String jar = "C:\\Users\\default.DESKTOP-0BP338U\\Desktop\\workspace\\workspace_study\\spark-start-on-yarn\\target\\spark-start-on-yarn-0.0.1-SNAPSHOT.jar";

SparkConf conf = new SparkConf().setAppName(appName).setMaster(master)
.set("spark.driver.host", host)
.setJars(new String[]{jar});
JavaSparkContext sc = new JavaSparkContext(conf);


but connection was refused.

how can i develop and test the spark program on my A machine?




i added environment variable

enter image description here

and this is my code

SparkConf conf = new SparkConf()
.setAppName(new Date(System.currentTimeMillis()).toString())
.setMaster("yarn");
JavaSparkContext sc = new JavaSparkContext(conf);


List<Integer> data = Arrays.asList(1,2,3,4,1,2,3,4,5,1,4,1,1,1,4,2,2,4,1,1,3,4,2,3);
JavaRDD<Integer> distData = sc.parallelize(data);

JavaPairRDD<Integer, Integer> pairs = distData.mapToPair(s -> new Tuple2<Integer, Integer>(s, 1));
JavaPairRDD<Integer, Integer> counts = pairs.reduceByKey((a, b) -> a + b);

System.out.println("================= " + counts);

sc.close();
sc.stop();


and error is "SparkException : Cloud not parse Master URL: 'yarn'"

what things i missed? please help me...

Answer Source

You need to

  1. Download your Hadoop cluster's HADOOP_CONF_DIR configuration files.

  2. Set HADOOP_CONF_DIR envrionment variable in your machine. Or, if that doesn't work, then you can place the XML files in your src/main/resources folder to include them on the classpath.

  3. Use setMaster("yarn-client")

Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster. These configs are used to write to HDFS and connect to the YARN ResourceManager)

Spark on YARN

Running Spark from an outside machine

  1. Make an HDFS /user folder with your local username. This is needed for HDFS permissions.

  2. Develop, and preferably use Maven/Gradle to manage your Java libraries. You also need to use the Cloudera Maven repository for you respective Hadoop versions

You don't need setJars() either. Your app should connect and run on it's own.