How to create Java Project with Apache Spark

We shall look into how to create Java Project with Apache Spark with all the required jars and libraries in Eclipse IDE to start working with Apache Spark. The process should be same with other IDEs like IntelliJ IDEA.

Prerequisites

Eclipse – Create Java Project with Apache Spark

  • Step 1 – Download Apache Spark

    Download Apache Spark from https://spark.apache.org/downloads.html. The package is around ~200MB. It might take a few minutes.

    Download Apache Spark for Setup - Apache Spark Tutorial - www.tutorialkart.com

    Download Apache Spark

  • Step 2 – Unzip and find jars

    Unzip the downloaded folder.
    The contents present would be as below :

    Apache Spark Package Contents - Apache Spark Tutorial - www.tutorialkart.com

    Apache Spark Package Contents

    jars : this folder contains all the jars that needs to be included in the build path of our project.

  • Step 3 – Create Java Project and copy jars

    Create a Java Project in Eclipse, and copy jars folder in spark directory to the Java Project, SparkMLlib22.

    Create a Java Project with Apache Spark MLlib - Apache Spark Tutorial - www.tutorialkart.com

    Create a Java Project and copy Jars

  • Step 4 – Add Jars to Java Build Path

    Right click on Project (SparkMLlbi22) -> Properties -> Java Build Path(3rd item in the left panel) -> Libraries (3rd tab) -> Add Jars (button on right side panel) -> In the Jar Selection, Select all the jars in the ‘jars‘ folder -> Apply -> OK.

    Add jars to build path - Apache Spark Tutorial - www.tutorialkart.com

    Add jars to build path

  • Step 5 – Check the setup – Run an MLLib example

    You may also copy ‘data’ folder to the project and add ‘jars’ in spark ‘examples‘ directory to have a quick glance on how to work with different modules of Apache Spark.
    We shall run the following Java Program, JavaRandomForestClassificationExample.java, to check if the Apache Spark setup is successful with the Java Project.

We have successfully learnt to create Java Project with Apache Spark libraries and run an MLlib example program.