How to configure Apache Spark Application

Configure Apache Spark Application – Apache Spark Application could be configured using properties that could be set directly on a SparkConf object that is passed during SparkContext initialization.

Configure Apache Spark Application using Spark Properties

Following are the properties (and their descriptions) that could be used to tune and fit a spark application in the Apache Spark ecosystem. We shall discuss the following properties with details and examples :

Spark Application Name
Number of Spark Driver Cores
Spark Driver’s Maximum Result Size
Spark Driver’s Memory
Spark Executors’ Memory
Spark Extra Listeners
Spark Local Directory
Log Spark Configuration
Spark Master
Deploy Mode of Spark Driver
Log Application Information
Spark Driver Supervise Action

Application Name

Property Name : spark.app.name

Default value : (none)

This is the name that you could give to your spark application. This application name appears in the Web UI and logs, which makes it easy for debugging and visualizing when multiple spark applications are running on the machine/cluster.

Following is an example to set spark application name :

AppConfigureExample.java

</>

Copy

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
/**
* Configure Apache Spark Application Name
*/
public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}

Output

spark.app.id=local-1501222987079
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.port=44103
spark.executor.id=driver
spark.master=local[2]

Number of Spark Driver Cores

Property Name : spark.driver.cores

Default value : 1

Exception : This property is considered only in cluster mode.

It represents the maximum number of cores, a driver process may use.

Following is an example to set number spark driver cores.

AppConfigureExample.java

</>

Copy

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class c{
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.cores", "2");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}

Output

spark.app.id=local-1501223394277
spark.app.name=SparkApplicationName
spark.driver.cores=2
spark.driver.host=192.168.1.100
spark.driver.port=42100
spark.executor.id=driver
spark.master=local[2]

Driver’s Maximum Result Size

Property Name : spark.driver.maxResultSize

Default value : 1g (meaning 1 GB)

Exception : Minimum 1MB

This is the higher limit on the total sum of size of serialized results of all partitions for each Spark action. Submitted jobs abort if the limit is exceeded. Setting it to ‘0’ means, there is no upper limit. But, if the value set by the property is exceeded, out-of-memory may occur in driver.

Following is an example to set Maximum limit on Spark Driver’s memory usage.

AppConfigureExample.java

</>

Copy

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.maxResultSize", "200m");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}

Output

spark.app.id=local-1501224103438
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.maxResultSize=200m
spark.driver.port=35249
spark.executor.id=driver
spark.master=local[2]

Driver’s Memory Usage

Property Name : spark.driver.memory

Default value : 1g (meaning 1 GB)

Exception : If spark application is submitted in client mode, the property has to be set via command line option –driver-memory.

This is the higher limit on the memory usage by Spark Driver. Submitted jobs abort if the limit is exceeded. Setting it to ‘0’ means, there is no upper limit. But, if the value set by the property is exceeded, out-of-memory may occur in driver.

Following is an example to set Maximum limit on Spark Driver’s memory usage.

AppConfigureExample.java

</>

Copy

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.memory", "600m");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}

Output

spark.app.id=local-1501225134344
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.memory=600m
spark.driver.port=43159
spark.executor.id=driver
spark.master=local[2]

Conclusion

In this Apache Spark Tutorial, we learned some of the properties of a Spark Project.

TutorialKart

How to configure Apache Spark Application

Configure Apache Spark Application using Spark Properties

Application Name

Number of Spark Driver Cores

Driver’s Maximum Result Size

Driver’s Memory Usage

Conclusion

Popular Courses

SAP

CRM

SAP Resources

Apache

GUI

Programming

Databases

Mobile

Linux

Web & Server

Testing

Learning