Read JSON file to Dataset

Spark Dataset is the latest API, after RDD and DataFrame, from Spark to work with data. In this tutorial, we shall learn how to read JSON file to Spark Dataset with an example.

In this tutorial, we will learn how to read a JSON file to Spark Dataset, with the help of example Spark Application.

Steps to read JSON file to Dataset in Spark

To read JSON file to Dataset in Spark

  1. Create a Bean Class (a simple class with properties that represents an object in the JSON file).
  2. Create a SparkSession.
  3. Initialize an Encoder with the Java Bean Class that you already created. This helps to define the schema of JSON data we shall load in a moment.
  4. Using SparkSession, read JSON file with schema defined by Encoder. SparkSession.read().json(jsonPath).as(beanEncoder); shall return a Dataset with records of Java bean type.

Example – Read JSON file to Dataset

Following is a Java example where we shall create an Employee class to define the schema of data in the JSON file, and read JSON file to Dataset.

JSONtoDataSet.java

import java.io.Serializable;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;

public class JSONtoDataSet {

	public static class Employee implements Serializable{
		public String name;
		public int salary;
	}

	public static void main(String[] args) {
		// configure spark
		SparkSession spark = SparkSession
				.builder()
				.appName("Read JSON File to DataSet")
				.master("local[2]")
				.getOrCreate();

		// Java Bean (data class) used to apply schema to JSON data
		Encoder<Employee> employeeEncoder = Encoders.bean(Employee.class);

		String jsonPath = "data/employees.json";

		// read JSON file to Dataset
		Dataset<Employee> ds = spark.read().json(jsonPath).as(employeeEncoder);
		ds.show();
	}
}

Output

+-------+------+
|   name|salary|
+-------+------+
|Michael|  3000|
|   Andy|  4500|
| Justin|  3500|
|  Berta|  4000|
|   Raju|  3000|
| Chandy|  4500|
|   Joey|  3500|
|    Mon|  4000|
| Rachel|  4000|
+-------+------+

Conclusion

In this Spark TutorialRead JSON file to Dataset, we have learnt to read JSON file as objects, with a specified schema given by encoder.