How to Setup an Apache Spark Cluster

Apache Spark Tutorial – We shall learn to setup an Apache Spark Cluster with a master node and multiple slave(worker) nodes. You can setup a computer running Windows/Linux/MacOS as a master or slave.

Setup an Apache Spark Cluster

To Setup an Apache Spark Cluster, we need to know to setup master node and to setup worker node.

Setup Master Node

Following is a step by step guide to setup Master node for an Apache Spark cluster. Execute the following steps on the node, which you want to be a Master.

  1. Navigate to Spark Configuration Directory

    Go to “SPARK_HOME/conf/” directory.

    SPARK_HOME is the complete path to root directory of Apache Spark in your computer.

  2. Edit the file spark-env.sh – Set SPARK_MASTER_HOST

    Note : If spark-env.sh is not present, spark-env.sh.template would be present. Make a copy of spark-env.sh.template with name “spark-env.sh” and add/edit the field “SPARK_MASTER_HOST“. Part of the file with SPARK_MASTER_HOST addition is shown below:

    Replace the ip with the ip address assigned to your computer (which you would like to make as a master).

  3. Start spark as master

    Goto SPARK_HOME/sbin and execute the following command.

    $ ./start-master.sh
  4. Verify the log file.

    You would see the following in the log file specifying ip address of the master node, the port on which spark has been started, port number on which WEB UI has been started, etc.

Setting up Master Node is complete.

 

Setup Slave(Worker) Node

Following is a step by step guide to setup Slave(Worker) node for an Apache Spark cluster. Execute the following steps on all of the nodes, which you want to be as worker nodes.

  1. Navigate to Spark Configuration Directory

    Go to “SPARK_HOME/conf/” directory.

    SPARK_HOME is the complete path to root directory of Apache Spark in your computer.

  2. Edit the file spark-env.sh – Set SPARK_MASTER_HOST

    Note : If spark-env.sh is not present, spark-env.sh.template would be present. Make a copy of spark-env.sh.template with name “spark-env.sh” and add/edit the field “SPARK_MASTER_HOST“. Part of the file with SPARK_MASTER_HOST addition is shown below:

    Replace the ip with the ip address assigned to your master (that you used in setting up master node).

  3. Start spark as slave

    Goto SPARK_HOME/sbin and execute the following command.

    $ ./start-slave.sh spark://<your.master.ip.address>:7077
  4. Verify the log

    You would find in the log that this Worker node has been successfully registered with master running at spark://192.168.0.102:7077 on the network.

The setup of Worker node is successful.

To add more worker nodes to the Apache Spark cluster, you may just repeat the process of worker setup on other nodes as well.

Once you have added some slaves to the cluster, you can view the workers connected to the master via Master WEB UI.

Hit the url http://<your.master.ip.address>:<web-ui-port-number>/ (example is http://192.168.0.102:8081/) in browser. Following would be the output with slaves connected listed under Workers.

Master WEB UI - setup an Apache Spark Cluster - Apache Spark Tutorial - www.tutorialkart.com

Master WEB UI – Setup an Apache Spark Cluster

 

Conclusion :

In this Apache Spark Tutorial, we have successfully setup an Apache Spark cluster.