Scala Application can be created with Apache Spark as dependency.

In this tutorial, we shall learn to setup a Scala project with Apache Spark in Eclipse IDE; and also run a WordCount example.

Setup Spark Scala Application in Eclipse

Following is a step by step process to setup Spark Scala Application in Eclipse.

1 Download Scala Eclipse

Download Scala Eclipse (in Ubuntu) or install scala plugin from Eclipse Marketplace.

2 Create new Scala Project

Open Eclipse and Create a new Scala Project.

3 Download Latest Spark

Hit the url [https://spark.apache.org/downloads.html].

4 Add Spark Libraries

Go to Java Build Path, and add all the jars present under scala-n.n.n-bin-hadoopN.N/jars/. This should be similar to the process of creating a Java Project with Apache Spark libraries.

5 Scala Version

If you get any errors with the scala version of the eclipse, you may change and give a try. To change scala version of your project :Java Build Path -> Libraries -> Add Library -> Scala Library -> Choose a lower version than the latest and click on Finish. Give a try with all the versions available if you have an issue with Scala version.

6 New Scala Class WordCountscala

Right click on the project and create a new Scala class. Name it WordCount. The class would be WordCount.scala.In the following example, we provided input placed at data/wordcount/input.txt. The output is generated at root of the Project, or you may change its location as well. The output folder contains files with result and status (SUCCESS/FAILURE).

WordCount.scala

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
 
object WordCount {
    def main(args: Array[String]) {
 
        /* configure spark application */
        val conf = new SparkConf().setAppName("Spark Scala WordCount Example").setMaster("local[1]")
 
        /* spark context*/
        val sc = new SparkContext(conf)
 
        /* map */
        var map = sc.textFile("data/wordcount/input.txt").flatMap(line => line.split(" ")).map(word => (word,1))
 
        /* reduce */
        var counts = map.reduceByKey(_ + _)
 
        /* print */
        counts.collect().foreach(println)
 
        /* or save the output to file */
        counts.saveAsTextFile("out.txt")
 
        sc.stop()
    }
}

7 Run Spark Application

Run WordCount.scala as Scala application. Upon successful run, the result should be stored in out.txt folder.

Conclusion

In this Apache Spark Tutorial – Spark Scala Application, we have learnt to setup a Scala Project in Eclipse with Apache Spark libraries, and run WordCount example application.