Configure Apache Spark Application – Apache Spark Application could be configured using properties that could be set directly on a SparkConf object that is passed during SparkContext initialization.

Configure Apache Spark Application using Spark Properties

Following are the properties (and their descriptions) that could be used to tune and fit a spark application in the Apache Spark ecosystem. We shall discuss the following properties with details and examples :

Application Name

Property Name :spark.app.name

Default value : (none)

This is the name that you could give to your spark application. This application name appears in the Web UI and logs, which makes it easy for debugging and visualizing when multiple spark applications are running on the machine/cluster.

Following is an example to set spark application name :

AppConfigureExample.java

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
/**
* Configure Apache Spark Application Name
*/
public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}


Output

spark.app.id=local-1501222987079
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.port=44103
spark.executor.id=driver
spark.master=local[2]

Number of Spark Driver Cores

Property Name :spark.driver.cores

Default value : 1

Exception : This property is considered only in cluster mode.

It represents the maximum number of cores, a driver process may use.

Following is an example to set number spark driver cores.

AppConfigureExample.java

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class c{
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.cores", "2");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}

Output

spark.app.id=local-1501223394277
spark.app.name=SparkApplicationName
spark.driver.cores=2
spark.driver.host=192.168.1.100
spark.driver.port=42100
spark.executor.id=driver
spark.master=local[2]

Driver’s Maximum Result Size

Property Namespark.driver.maxResultSize

Default value : 1g (meaning 1 GB)

Exception : Minimum 1MB

This is the higher limit on the total sum of size of serialized results of all partitions for each Spark action. Submitted jobs abort if the limit is exceeded. Setting it to ‘0’ means, there is no upper limit. But, if the value set by the property is exceeded, out-of-memory may occur in driver.

Following is an example to set Maximum limit on Spark Driver’s memory usage.

AppConfigureExample.java

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.maxResultSize", "200m");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}

Output

spark.app.id=local-1501224103438
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.maxResultSize=200m
spark.driver.port=35249
spark.executor.id=driver
spark.master=local[2]

Driver’s Memory Usage

Property Namespark.driver.memory

Default value : 1g (meaning 1 GB)

Exception : If spark application is submitted in client mode, the property has to be set via command line option  –driver-memory.

This is the higher limit on the memory usage by Spark Driver. Submitted jobs abort if the limit is exceeded. Setting it to ‘0’ means, there is no upper limit. But, if the value set by the property is exceeded, out-of-memory may occur in driver.

Following is an example to set Maximum limit on Spark Driver’s memory usage.

AppConfigureExample.java

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.memory", "600m");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}

Output

spark.app.id=local-1501225134344
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.memory=600m
spark.driver.port=43159
spark.executor.id=driver
spark.master=local[2]

Conclusion

In this Apache Spark Tutorial, we learned some of the properties of a Spark Project.