Python Spark Shell – PySpark – Word Count Example

Spark Shell

Spark Shell is an interactive shell through which we can access Spark’s API. Spark provides the shell in two programming languages : Scala and Python. In this Apache Spark Tutorial, we shall learn the usage of Python Spark Shell with a basic word count example.

Prerequisites

It is assumed that you already installed Apache Spark on your local machine. If not, please refer Install Spark on Ubuntu or Install Spark on MacOS.

Python Spark Shell – PySpark

Start Spark interactive Python Shell

Python shell could be started by opening a Terminal window and running the following command :

$ pyspark

For the word-count example, we shall start with option --master local[4]  meaning the spark context of this spark shell acts as a master on local node with 4 threads.

If you accidentally started spark shell without options, you may kill the shell instance.

Spark context Web UI would be available at  http://192.168.0.104:4040 [The default port is 4040]. You may Open a browser and hit the url.
PySpark Shell - Web UI

Spark context : You may access the spark context in the shell as variable named ‘sc’.

Spark session : You may access the spark session in the shell as variable named ‘spark’.

Word-Count Example with Python Spark – PySpark Shell

Run the following Python commands in PySpark Shell in the same order

Input :

In this step, using Spark context variable, sc, we read a text file

Map :

We can split each line of input using space ” ” as separator

and we map each word to a tuple (word, 1), 1 being the number of occurrences of word.

We use the tuple (word,1) as (key, value) in reduce stage.

Reduce:

We reduce all the words based on Key

Save counts to local file

The counts could be saved to local file.

All the commands run in Terminal is shown below :

Verify the Output

Sample of the contents of output file, part-00000, is shown below :

We have successfully counted unique words in a file with the help of Python Spark Shell – PySpark.

You may use Spark Context Web UI to check the details of the Job (Word Count) we have just done.

Python Spark Shell - PySpark - Example JOB

Navigate through other tabs to get an idea of Spark Web UI and the details about the Word Count Job.

 

Conclusion :

In this tutorial – Python Spark Shell – PySpark, we have learnt the usage of Spark Shell using Python programming language with the help of Word Count Example.