Apache Spark can be configured to run as a master node or slate node. In this tutorial, we shall learn to setup an Apache Spark Cluster with a master node and multiple slave(worker) nodes. You can setup a computer running Windows/Linux/MacOS as a master or slave.

Apache Spark Cluster Setup

Setup an Apache Spark Cluster

To Setup an Apache Spark Cluster, we need to know two things :

  1. Setup master node
  2. Setup worker node.

Setup Spark Master Node

Following is a step by step guide to setup Master node for an Apache Spark cluster. Execute the following steps on the node, which you want to be a Master.

  1. Navigate to Spark Configuration Directory

    Go to SPARK_HOME/conf/ directory.

    SPARK_HOME is the complete path to root directory of Apache Spark in your computer.

  2. Edit the file spark-env.sh – Set SPARK_MASTER_HOST

    Note : If spark-env.sh is not present, spark-env.sh.template would be present. Make a copy of spark-env.sh.template with name spark-env.sh and add/edit the field SPARK_MASTER_HOST. Part of the file with SPARK_MASTER_HOST addition is shown below:

    Replace the ip with the ip address assigned to your computer (which you would like to make as a master).

  3. Start spark as master

    Goto SPARK_HOME/sbin and execute the following command.

  4. Verify the log file

    You would see the following in the log file, specifying ip address of the master node, the port on which spark has been started, port number on which WEB UI has been started, etc.

Setting up Master Node is complete.

Setup Spark Slave(Worker) Node

Following is a step by step guide to setup Slave(Worker) node for an Apache Spark cluster. Execute the following steps on all of the nodes, which you want to be as worker nodes.

  1. Navigate to Spark Configuration Directory

    Go to SPARK_HOME/conf/ directory.

    SPARK_HOME is the complete path to root directory of Apache Spark in your computer.

  2. Edit the file spark-env.sh – Set SPARK_MASTER_HOST

    Note : If spark-env.sh is not present, spark-env.sh.template would be present. Make a copy of spark-env.sh.template with name spark-env.sh and add/edit the field SPARK_MASTER_HOST. Part of the file with SPARK_MASTER_HOST addition is shown below:

    Replace the ip with the ip address assigned to your master (that you used in setting up master node).

  3. Start spark as slave

    Goto SPARK_HOME/sbin and execute the following command.

  4. Verify the log

    You would find in the log that this Worker node has been successfully registered with master running at spark://192.168.0.102:7077 on the network.

The setup of Worker node is successful.

Multiple Spark Worker Nodes

To add more worker nodes to the Apache Spark cluster, you may just repeat the process of worker setup on other nodes as well.

Once you have added some slaves to the cluster, you can view the workers connected to the master via Master WEB UI.

Hit the url http://<your.master.ip.address>:<web-ui-port-number>/ (example is http://192.168.0.102:8081/) in browser. Following would be the output with slaves connected listed under Workers.

Master WEB UI - setup an Apache Spark Cluster - Apache Spark Tutorial - www.tutorialkart.com

Conclusion

In this Apache Spark Tutorial, we have successfully setup a master node and multiple worker nodes, thus an Apache Spark cluster. In our next tutorial we shall learn to configure spark ecosystem.