Spark RDD with custom class objects

To assign Spark RDD with custom class objects, implement the custom class with Serializable interface, create an immutable list of custom class objects, then parallelize the list with SparkContext. Parallelizing returns RDD created with custom class objects as elements.

Java Example

Following example demonstrates the creation of RDD with list of class objects.

import java.util.List;
import org.apache.spark.SparkConf;

public class CustomObjectsRDD {

	public static void main(String[] args) {
		// configure spark
		SparkConf sparkConf = new SparkConf().setAppName("Print Elements of RDD")
		// start a spark context
		JavaSparkContext sc = new JavaSparkContext(sparkConf);
		// prepare list of objects
		List<Person> personList = ImmutableList.of(
			    new Person("Arjun", 25),
			    new Person("Akhil", 2));
		// parallelize the list using SparkContext
		JavaRDD<Person> perJavaRDD = sc.parallelize(personList);
		for(Person person : perJavaRDD.collect()){

class Person implements Serializable{
	private static final long serialVersionUID = -2685444218382696366L;
	String name;
	int age;
	public Person(String name, int age){ = name;
		this.age = age;


18/02/10 21:59:10 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
18/02/10 21:59:10 INFO DAGScheduler: ResultStage 0 (collect at finished in 0.223 s
18/02/10 21:59:10 INFO DAGScheduler: Job 0 finished: collect at, took 0.661038 s
18/02/10 21:59:10 INFO SparkContext: Invoking stop() from shutdown hook
18/02/10 21:59:10 INFO SparkUI: Stopped Spark web UI at
18/02/10 21:59:10 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!


In this Spark TutorialSpark RDD with custom class objects, we have learnt to initialize RDD from an immutable list of custom objects using SparkContext.parallelize(), with the help of an Example.