Nakul Chawla Nakul Chawla - 1 year ago 167
Python Question

How to get python libraries in pyspark?

I want to use matplotlib.bblpath or shapely.geometry libraries in pyspark.

When I try to import any of them I get the below error:

>>> from shapely.geometry import polygon
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named shapely.geometry

I know the module isn't present, but I want to know how can these packages be brought to my pyspark libraries.

Answer Source

In the Spark context try using:

SparkContext.addPyFile("")  # also .zip

, quoting from the docs:

Add a .py or .zip dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.