Spark RDD foreach

Spark RDD foreach is used to apply a function for each element of an RDD. In this tutorial, we shall learn the usage of RDD.foreach() method with example Spark applications.

Syntax of RDD foreach

public void foreach(scala.Function1<T,scala.runtime.BoxedUnit> f)

Argument could be a lambda function or use org.apache.spark.api.java.function VoidFunction functional interface as the assignment target for a lambda expression or method reference.

foreach method does not modify the contents of RDD.

ADVERTISEMENT

Example – Spark RDD foreach

In this example, we will take an RDD with strings as elements. We shall use RDD.foreach() on this RDD, and for each item in the RDD, we shall print the item.

RDDforEach.java

import java.util.Arrays;
import java.util.List;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class RDDforEach {

	public static void main(String[] args) {
		// configure spark
		SparkConf sparkConf = new SparkConf().setAppName("Spark RDD foreach Example")
				.setMaster("local[2]").set("spark.executor.memory","2g");
		// start a spark context
		JavaSparkContext sc = new JavaSparkContext(sparkConf);

		// read list to RDD
		List<String> data = Arrays.asList("Learn","Apache","Spark","with","Tutorial Kart"); 
		JavaRDD<String> items = sc.parallelize(data,1);

		// apply a function for each element of RDD
		items.foreach(item -> {
			System.out.println("* "+item); 
		});
	}
}

Output

* Learn
* Apache
* Spark
* with
* Tutorial Kart

Conclusion

In this Spark TutorialRDD foreach, we have learnt to apply a function for each of the elements in RDD using RDD.foreach() method.