Michal Michal - 4 months ago 34
Python Question

Installing PySpark

I am trying to install PySpark and following the instructions and running this from the command line on the cluster node where I have Spark installed:

$ sbt/sbt assembly


This produces the following error:

-bash: sbt/sbt: No such file or directory


I try the next command:

$ ./bin/pyspark


I get this error:

-bash: ./bin/pyspark: No such file or directory


I feel like I'm missing something basic.
What is missing?
I have spark installed and am able to access it using the command:

$ spark-shell


I have python on the node and am able to open python using the command:

$ python

Answer

What's your current working directory? The sbt/sbt and ./bin/pyspark commands are relative to the directory containing Spark's code ($SPARK_HOME), so you should be in that directory when running those commands.

Note that Spark offers pre-built binary distributions that are compatible with many common Hadoop distributions; this may be an easier option if you're using one of those distros.

Also, it looks like you linked to the Spark 0.9.0 documentation; if you're building Spark from scratch, I recommend following the latest version of the documentation.