user3541869 user3541869 -4 years ago 95
Python Question

How to set specific Hadoop version for Spark, Python

I need help with setting a specific hadoop version in my spark config. I read somewhere that you can use the hadoop.version property. It doesn't say where to find it.

http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version

I need to set it from current/default to 2.8.0. Im coding in PyCharm. Please help, preferebly with a step-by-step guide.

Thanks!

Answer Source

You can build like that, for Apache Hadoop 2.7.X and later, so the above answer is correct. [

 ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package

]

Or you could modify this in the pom.xml of your spark downloaded distribution before performing the maven build, so that the building gets done with the version you want.

<profile>
    <id>hadoop2.8</id>
    <properties>
        <hadoop.version>2.8</hadoop.version>
    ...
    </properties>
</profile>

Take a look at this post for a step-by-step guidance.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download