Scala Spark Shell – Word Count Example

Spark Shell

Spark Shell is an interactive shell through which we can access Spark’s API. Spark provides the shell in two programming languages : Scala and Python. In this Apache Spark Tutorial, we shall learn the usage of Scala Spark Shell with a basic word count example.

Prerequisites

It is assumed that you already installed Apache Spark on your local machine. If not, please refer Install Spark on Ubuntu or Install Spark on MacOS.

Scala Spark Shell

Start Spark interactive Scala Shell

Scala shell could be started by opening a Terminal window and running the following command :

$ spark-shell

For the word-count example, we shall start with option --master local[4]  meaning the spark context of this spark shell acts as a master on local node with 4 threads.

If you accidentally started spark shell without options, kill the shell instance.

From the above Shell startup, following points could be made

Spark context Web UI is available at  http://192.168.0.104:4040 . You may Open a browser and hit the url.
Spark context Web UI

Spark context available as ‘sc’, meaning you may access the spark context in the shell as variable named ‘sc’.

Spark session available as ‘spark’, meaning you may access the spark session in the shell as variable named ‘spark’.

Word-Count Example with Spark (Scala) Shell

Following are the three commands that could be run in Spark Scala Shell, one by one

Map :

In this step, using Spark context variable, sc, we read a text file

then we split each line using space ” ” as separator

and we map each word to a tuple (word, 1), 1 being the number of occurrences of word.

We use the tuple (word,1) as (key, value) in reduce stage.

Reduce:

We reduce all the words based on Key

Save counts to local file

The counts could be saved to local file.

All the commands run in Terminal is shown below :

Verify the Output

Sample of the contents of output file, part-00000, is shown below :

We have successfully counted unique words in a file with the help of Scala Spark Shell.

You may use Spark Context Web UI to check the details of the Job (Word Count) we have just done.

Scala Spark Shell - Web UI

Navigate through other tabs to get an idea of Spark Web UI and the details about the Word Count Job.

 

Useful Tips

Suggestions

Spark Shell can provide suggestions. Type part of the command and click on ‘Tab’ key for suggestions.

 

Kill the Spark Shell Instance

To kill the spark shell instance, hit Control+Z on the current shell and kill the spark instance using pid and with the help of kill command.

Find pid :

In this case, 8906 is the pid.

Kill the instance using pid :

 

Conclusion :

In this tutorial – Scala Spark Shell, we have learnt the usage of Spark Shell using Scala programming language with the help of Word Count Example.