mCs mCs - 3 months ago 7
Scala Question

How to setup fully functional (includeing cluster) Spark learning developement on one machine?

I want to start learning

Spark 2.0
so I try to setup my dev (
Scalav2.11
) environment.

Spark uses a distributed env. to work on one cluster across multiple separate machines each node per machine. However, I do not have many machines for my testing purpose I only have one machine with
CentOS 7
on it.

I am not after performance, I need something that would simulate a working cluster so that I could learn Spark.

How can I setup a development environment to learn and develop Spark applications without having to access multiple machines but still being able to learn and write code for fully functional Spark based environment?

Answer

Start with local mode.

Spark will do everything as usual: spawn executors, distribute tasks etc, the only step that will be omitted is the transfer of data across the network, and it's done completely under the hood in production so you don't need to take this omission into account while coding.

You will be able to specify number of executors (only threads in this mode), and test for example the fact that Spark Streaming needs at least 2 of them.

Refering to your comments:

Or it does not make much sense to make a cluster to learn spark because it is all done under the hood and the programming is all the same on local and say standalone/YARN/mesos mode

Yes, there are some conventions, but they are exactly the same on local and other modes.

Does the local mode means that I will be able to start exemplary cluster with say 3 nodes?

local[3] should do the trick.

Comments