tg89 tg89 - 28 days ago 7
Python Question

Apache Spark- Error initializing SparkContext. java.io.FileNotFoundException

I am able to run simple Hello World program through Spark on standalone machine. But when I run a word count program using Spark Context and run it using pyspark I get the following error.
ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py does not exist.
I am on Mac OS X. I installed Spark through brew by brew install apache-spark. Any ideas now whats going wrong?

Using Spark's default log4j profile:

org/apache/spark/log4j-defaults.properties
16/07/19 23:18:45 INFO SparkContext: Running Spark version 1.6.2
16/07/19 23:18:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/19 23:18:45 INFO SecurityManager: Changing view acls to: tanyagupta
16/07/19 23:18:45 INFO SecurityManager: Changing modify acls to: tanyagupta
16/07/19 23:18:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tanyagupta); users with modify permissions: Set(tanyagupta)
16/07/19 23:18:46 INFO Utils: Successfully started service 'sparkDriver' on port 59226.
16/07/19 23:18:46 INFO Slf4jLogger: Slf4jLogger started
16/07/19 23:18:46 INFO Remoting: Starting remoting
16/07/19 23:18:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.5:59227]
16/07/19 23:18:46 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 59227.
16/07/19 23:18:46 INFO SparkEnv: Registering MapOutputTracker
16/07/19 23:18:46 INFO SparkEnv: Registering BlockManagerMaster
16/07/19 23:18:46 INFO DiskBlockManager: Created local directory at /private/var/folders/2f/fltslxd54f5961xsc2wg1w680000gn/T/blockmgr-812de6f9-3e3d-4885-a7de-fc9c2e181c64
16/07/19 23:18:46 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/07/19 23:18:46 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/19 23:18:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/19 23:18:46 INFO SparkUI: Started SparkUI at http://192.168.0.5:4040
16/07/19 23:18:46 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py does not exist.
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
16/07/19 23:18:47 INFO SparkUI: Stopped Spark web UI at http://192.168.0.5:4040
16/07/19 23:18:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/07/19 23:18:47 INFO MemoryStore: MemoryStore cleared
16/07/19 23:18:47 INFO BlockManager: BlockManager stopped
16/07/19 23:18:47 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/19 23:18:47 WARN MetricsSystem: Stopping a MetricsSystem that is not running
16/07/19 23:18:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/07/19 23:18:47 INFO SparkContext: Successfully stopped SparkContext

Traceback (most recent call last):
File "/Users/tanyagupta/Documents/Internship/Zyudly Labs/Tanya-Programs/word_count.py", line 7, in <module>
sc=SparkContext(appName="WordCount_Tanya")
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 172, in _do_init
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 235, in _initialize_context
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: Added file file:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py does not exist.
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)

16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/07/19 23:18:47 INFO ShutdownHookManager: Shutdown hook called
16/07/19 23:18:47 INFO ShutdownHookManager: Deleting directory /private/var/folders/2f/fltslxd54f5961xsc2wg1w680000gn/T/spark-f69e5dfc-6561-4677-9ec0-03594eabc991

Answer Source

Adding __init__.py file in my folder worked for me!

Thanks!