Spark – Write Dataset to JSON file
Dataset class provides an interface for saving the content of the non-streaming Dataset out into external storage. JSON is one of the many formats it provides. In this tutorial, we shall learn to write Dataset to a JSON file.
Steps to Write Dataset to JSON file in Spark
To write Spark Dataset to JSON file
- Apply write method to the Dataset. Write method offers many data formats to be written to.Dataset.write()
- Use json and provide the path to the folder where JSON file has to be created with data from Dataset.Dataset.write().json(pathToJSONout)
ADVERTISEMENT
Example – Spark – Write Dataset to JSON file
In the following Java Example, we shall read some data to a Dataset and write the Dataset to JSON file in the folder specified by the path.
WriteDataSetToJSON.java
import java.io.Serializable; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Encoder; import org.apache.spark.sql.Encoders; import org.apache.spark.sql.SparkSession; public class WriteDataSetToJSON { public static class Employee implements Serializable{ public String name; public int salary; } public static void main(String[] args) { // configure spark SparkSession spark = SparkSession .builder() .appName("Spark Example - Write Dataset to JSON File") .master("local[2]") .getOrCreate(); Encoder<Employee> employeeEncoder = Encoders.bean(Employee.class); String jsonPath = "data/employees.json"; Dataset<Employee> ds = spark.read().json(jsonPath).as(employeeEncoder); // write dataset to JSON file ds.write().json("data/out_employees/"); } }
Output
A folder /out_employees/ is created with a JSON file and status if SUCCESS or FAILURE.
Conclusion
In this Spark Tutorial – Write Dataset to JSON file, we have learnt to use write() method of Dataset class and export the data to a JSON file using json() method.