Jon Jon - 29 days ago 8
Java Question

Pyspark socket connection

I am installing Spark on a set of VMs. I should also note that I followed the same installation process that I've used multiple times in the past on physical servers and VMs and have never seen this issue. I'm puzzled as to why I am seeing this now.

However, it seems pyspark is having some problem initializing the SparkContext.

>pyspark
Python 2.7.12 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/08/22 13:24:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/22 13:24:49 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Traceback (most recent call last):
File "/home/jon/spark/python/pyspark/shell.py", line 43, in <module>
spark = SparkSession.builder\
File "/home/jon/spark/python/pyspark/sql/session.py", line 169, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/home/jon/spark/python/pyspark/context.py", line 310, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/home/jon/spark/python/pyspark/context.py", line 118, in __init__
conf, jsc, profiler_cls)
File "/home/jon/spark/python/pyspark/context.py", line 188, in _do_init
self._accumulatorServer = accumulators._start_update_server()
File "/home/jon/spark/python/pyspark/accumulators.py", line 259, in _start_update_server
server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)
File "/apps/usr/local64/anaconda/lib/python2.7/SocketServer.py", line 417, in __init__
self.server_bind()
File "/apps/usr/local64/anaconda/lib/python2.7/SocketServer.py", line 431, in server_bind
self.socket.bind(self.server_address)
File "/apps/usr/local64/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -2] Name or service not known
>>> quit()


Interestingly enough,
spark-shell
does not show this problem. My intuition is that there is a problem with Python connecting to the server the JVM starts up. Does anyone have any suggestions on how to resolve/debug this?

>spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/08/22 13:13:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/22 13:13:59 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://172.25.5.46:4040
Spark context available as 'sc' (master = local[*], app id = local-1503425633272).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.1
/_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_25)
Type in expressions to have them evaluated.
Type :help for more information.

scala>


When I try to launch a simple program:

I see the following errors, similar to above

spark-submit test-pyspark.py
17/08/22 13:47:37 INFO SparkContext: Running Spark version 2.1.1
17/08/22 13:47:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/22 13:47:37 INFO SecurityManager: Changing view acls to: jon
17/08/22 13:47:37 INFO SecurityManager: Changing modify acls to: jon
17/08/22 13:47:37 INFO SecurityManager: Changing view acls groups to:
17/08/22 13:47:37 INFO SecurityManager: Changing modify acls groups to:
17/08/22 13:47:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jon); groups with view permissions: Set(); users with modify permissions: Set(jon); groups with modify permissions: Set()
17/08/22 13:47:38 INFO Utils: Successfully started service 'sparkDriver' on port 51440.
17/08/22 13:47:38 INFO SparkEnv: Registering MapOutputTracker
17/08/22 13:47:38 INFO SparkEnv: Registering BlockManagerMaster
17/08/22 13:47:38 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/08/22 13:47:38 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/08/22 13:47:38 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-c3ad2263-4416-45f2-927b-8517e4f3213f
17/08/22 13:47:38 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/08/22 13:47:38 INFO SparkEnv: Registering OutputCommitCoordinator
17/08/22 13:47:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/08/22 13:47:38 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.25.5.46:4040
17/08/22 13:47:38 INFO SparkContext: Added file file:/home/jon/test-pyspark.py at file:/home/jon/test-pyspark.py with timestamp 1503427658741
17/08/22 13:47:38 INFO Utils: Copying /home/jon/test-pyspark.py to /tmp/spark-71ba944d-e11b-4cd5-bfcc-386f85b28a9a/userFiles-095d828d-24ec-43a2-ac58-4d9eb07177aa/test-pyspark.py
17/08/22 13:47:38 INFO Executor: Starting executor ID driver on host localhost
17/08/22 13:47:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56262.
17/08/22 13:47:38 INFO NettyBlockTransferService: Server created on 172.25.5.46:56262
17/08/22 13:47:38 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/08/22 13:47:38 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.25.5.46, 56262, None)
17/08/22 13:47:38 INFO BlockManagerMasterEndpoint: Registering block manager 172.25.5.46:56262 with 366.3 MB RAM, BlockManagerId(driver, 172.25.5.46, 56262, None)
17/08/22 13:47:38 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.25.5.46, 56262, None)
17/08/22 13:47:38 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.25.5.46, 56262, None)
17/08/22 13:47:39 INFO SparkUI: Stopped Spark web UI at http://172.25.5.46:4040
17/08/22 13:47:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/08/22 13:47:39 INFO MemoryStore: MemoryStore cleared
17/08/22 13:47:39 INFO BlockManager: BlockManager stopped
17/08/22 13:47:39 INFO BlockManagerMaster: BlockManagerMaster stopped
**17/08/22 13:47:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!**
**17/08/22 13:47:39 INFO SparkContext: Successfully stopped SparkContext**
Traceback (most recent call last):
File "/home/jon/test-pyspark.py", line 5, in <module>
sc = SparkContext(conf=conf)
File "/home/jon/spark/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
File "/home/jon/spark/python/lib/pyspark.zip/pyspark/context.py", line 188, in _do_init
File "/home/jon/spark/python/lib/pyspark.zip/pyspark/accumulators.py", line 259, in _start_update_server
File "/apps/usr/local64/anaconda/lib/python2.7/SocketServer.py", line 417, in __init__
self.server_bind()
File "/apps/usr/local64/anaconda/lib/python2.7/SocketServer.py", line 431, in server_bind
self.socket.bind(self.server_address)
File "/apps/usr/local64/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -2] Name or service not known
17/08/22 13:47:39 INFO ShutdownHookManager: Shutdown hook called
17/08/22 13:47:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-71ba944d-e11b-4cd5-bfcc-386f85b28a9a

Answer Source

It looks like PySpark fails to start TCP server used for accumulator updates. AccumulatorServer is started at localhsost:

server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)

and the error:

socket.gaierror: [Errno -2] Name or service not known

suggests some isssue with the address resolution. Please double check your network configuration.

Based on

Looks like a network configuration issue. Could you include /etc/hosts?

Looks like the solution was to fix the permissions to /etc/hosts so that my VMs have access to read.