Scala Application can be created with Apache Spark as dependency.
In this tutorial, we shall learn to setup a Scala project with Apache Spark in Eclipse IDE; and also run a WordCount example.
Setup Spark Scala Application in Eclipse
Following is a step by step process to setup Spark Scala Application in Eclipse.
1 Download Scala Eclipse
Download Scala Eclipse (in Ubuntu) or install scala plugin from Eclipse Marketplace.
2 Create new Scala Project
Open Eclipse and Create a new Scala Project.
3 Download Latest Spark
Hit the url [https://spark.apache.org/downloads.html].
4 Add Spark Libraries
Go to Java Build Path
, and add all the jars present under scala-n.n.n-bin-hadoopN.N/jars/. This should be similar to the process of creating a Java Project with Apache Spark libraries.
5 Scala Version
If you get any errors with the scala version of the eclipse, you may change and give a try. To change scala version of your project :Java Build Path
-> Libraries
-> Add Library
-> Scala Library
-> Choose a lower version than the latest and click on Finish
. Give a try with all the versions available if you have an issue with Scala version.
6 New Scala Class WordCountscala
Right click on the project and create a new Scala class. Name it WordCount. The class would be WordCount.scala.In the following example, we provided input placed at data/wordcount/input.txt
. The output is generated at root of the Project, or you may change its location as well. The output folder contains files with result and status (SUCCESS/FAILURE).
WordCount.scala
import org.apache.spark.SparkContext import org.apache.spark.SparkConf object WordCount { def main(args: Array[String]) { /* configure spark application */ val conf = new SparkConf().setAppName("Spark Scala WordCount Example").setMaster("local[1]") /* spark context*/ val sc = new SparkContext(conf) /* map */ var map = sc.textFile("data/wordcount/input.txt").flatMap(line => line.split(" ")).map(word => (word,1)) /* reduce */ var counts = map.reduceByKey(_ + _) /* print */ counts.collect().foreach(println) /* or save the output to file */ counts.saveAsTextFile("out.txt") sc.stop() } }
7 Run Spark Application
Run WordCount.scala as Scala application. Upon successful run, the result should be stored in out.txt folder.
Conclusion
In this Apache Spark Tutorial – Spark Scala Application, we have learnt to setup a Scala Project in Eclipse with Apache Spark libraries, and run WordCount example application.