Scala Application can be created with Apache Spark as dependency.

In this tutorial, we shall learn to setup a Scala project with Apache Spark in Eclipse IDE; and also run a WordCount example.

Setup Spark Scala Application in Eclipse

Following is a step by step process to setup Spark Scala Application in Eclipse.

1. Download Scala Eclipse

Download Scala Eclipse (in Ubuntu) or install scala plugin from Eclipse Marketplace.

2. Create new Scala Project

Open Eclipse and Create a new Scala Project.

3. Download Latest Spark

Spark Scala Application - WordCount Example - Eclipse

Hit the url [https://spark.apache.org/downloads.html].

4. Add Spark Libraries

Go to Java Build Path, and add all the jars present under scala-n.n.n-bin-hadoopN.N/jars/. This should be similar to the process of creating a Java Project with Apache Spark libraries.

5. Scala Version

Spark Scala Application - WordCount Example - Eclipse

If you get any errors with the scala version of the eclipse, you may change and give a try. To change scala version of your project :Java Build Path -> Libraries -> Add Library -> Scala Library -> Choose a lower version than the latest and click on Finish. Give a try with all the versions available if you have an issue with Scala version.

6. New Scala Class WordCount.scala

Right click on the project and create a new Scala class. Name it WordCount. The class would be WordCount.scala.In the following example, we provided input placed at data/wordcount/input.txt. The output is generated at root of the Project, or you may change its location as well. The output folder contains files with result and status (SUCCESS/FAILURE).

WordCount.scala

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
 
object WordCount {
    def main(args: Array[String]) {
 
        /* configure spark application */
        val conf = new SparkConf().setAppName("Spark Scala WordCount Example").setMaster("local[1]")
 
        /* spark context*/
        val sc = new SparkContext(conf)
 
        /* map */
        var map = sc.textFile("data/wordcount/input.txt").flatMap(line => line.split(" ")).map(word => (word,1))
 
        /* reduce */
        var counts = map.reduceByKey(_ + _)
 
        /* print */
        counts.collect().foreach(println)
 
        /* or save the output to file */
        counts.saveAsTextFile("out.txt")
 
        sc.stop()
    }
}

7. Run Spark Application

Run WordCount.scala as Scala application. Upon successful run, the result should be stored in out.txt folder.

Spark Scala Application - WordCount Example - Eclipse

Conclusion

In this Apache Spark Tutorial – Spark Scala Application, we have learnt to setup a Scala Project in Eclipse with Apache Spark libraries, and run WordCount example application.