Configure Apache Spark Application by setting Spark properties before the SparkContext or SparkSession starts. A Spark application can be configured with SparkConf, with spark-submit options, or with shared values in spark-defaults.conf. This Apache Spark tutorial explains the common properties and where each value should be set.

Configure Apache Spark Application using Spark Properties

Following are the properties and configuration methods that could be used to tune and fit a spark application in the Apache Spark ecosystem. For version-specific defaults and advanced configuration keys, refer to the official Spark Configuration reference.

Spark configuration sources and priority

Spark properties can be supplied from more than one place. When the same property is supplied more than once, values set directly on SparkConf take the highest priority. Values passed through spark-submit options or a properties file come next. Values in spark-defaults.conf are used when no higher-priority value is supplied.

Configuration placeBest used for
SparkConfApplication name and runtime settings controlled by code.
spark-submitPer-run deployment values such as master, deploy mode, driver memory, executor memory, and driver cores.
spark-defaults.confCluster or team defaults shared by many Spark jobs.
spark-env.shPer-machine environment values such as local disk directories.
log4j2.propertiesSpark logging format and log levels.

Use explicit units for memory and time values. For example, write 512m, 2g, or 1t for memory where the property expects a size, and use supported time units such as ms, s, m, or h where the property expects a duration.

Application Name

Property Name : spark.app.name

Default value : (none)

This is the name that you could give to your spark application. This application name appears in the Web UI and logs, which makes it easy for debugging and visualizing when multiple spark applications are running on the machine/cluster.

Following is an example to set spark application name :

AppConfigureExample.java

</>
Copy
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
/**
* Configure Apache Spark Application Name
*/
public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}


Output

spark.app.id=local-1501222987079
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.port=44103
spark.executor.id=driver
spark.master=local[2]

You can also set the application name during submission with --name. That approach is useful when the same application JAR runs with different job names in different environments.

</>
Copy
./bin/spark-submit --class com.tutorialkart.spark.AppConfigureExample --master local[2] --name "SparkApplicationName" target/spark-app.jar

Number of Spark Driver Cores

Property Name : spark.driver.cores

Default value : 1

Exception : This property is considered only in cluster mode.

It represents the maximum number of cores, a driver process may use.

Following is an example to set number spark driver cores.

AppConfigureExample.java

</>
Copy
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class c{
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.cores", "2");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}

Output

spark.app.id=local-1501223394277
spark.app.name=SparkApplicationName
spark.driver.cores=2
spark.driver.host=192.168.1.100
spark.driver.port=42100
spark.executor.id=driver
spark.master=local[2]

The value is visible in the configuration output, but it affects the actual driver process only when the driver is launched in cluster mode. For submitted cluster jobs, prefer the submit-time option.

</>
Copy
./bin/spark-submit --class com.tutorialkart.spark.AppConfigureExample --master spark://spark-master:7077 --deploy-mode cluster --driver-cores 2 target/spark-app.jar

Driver’s Maximum Result Size

Property Namespark.driver.maxResultSize

Default value : 1g (meaning 1 GB)

Exception : Minimum 1MB

This is the upper limit on the total sum of size of serialized results of all partitions for each Spark action. Submitted jobs abort if the limit is exceeded. Setting it to 0 means there is no upper limit, but that can allow large results to cause out-of-memory errors in the driver.

Following is an example to set Maximum limit on Spark Driver’s memory usage.

AppConfigureExample.java

</>
Copy
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.maxResultSize", "200m");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}

Output

spark.app.id=local-1501224103438
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.maxResultSize=200m
spark.driver.port=35249
spark.executor.id=driver
spark.master=local[2]

Use this setting as a protective limit. If a job frequently reaches this limit, avoid returning large datasets to the driver with actions such as collect(). Write the result to storage or aggregate the data further.

Driver’s Memory Usage

Property Namespark.driver.memory

Default value : 1g (meaning 1 GB)

Exception : If spark application is submitted in client mode, the property has to be set via command line option  –driver-memory.

This is the amount of heap memory used by the Spark driver, where SparkContext is initialized. In client mode, the driver JVM has already started before your application code creates SparkConf, so setting spark.driver.memory inside the program does not resize the running driver JVM.

Following is an example to set Maximum limit on Spark Driver’s memory usage.

AppConfigureExample.java

</>
Copy
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class AppConfigureExample {
	public static void main(String[] args) {
		// configure spark
		SparkConf conf = new SparkConf().setMaster("local[2]");
		conf.set("spark.app.name", "SparkApplicationName");
		conf.set("spark.driver.memory", "600m");
		
		// start a spark context
		SparkContext sc = new SparkContext(conf);
		
		// print the configuration
		System.out.println(sc.getConf().toDebugString());
		
		// stop the spark context
		sc.stop();
	}
}

Output

spark.app.id=local-1501225134344
spark.app.name=SparkApplicationName
spark.driver.host=192.168.1.100
spark.driver.memory=600m
spark.driver.port=43159
spark.executor.id=driver
spark.master=local[2]

For client-mode submissions, set driver memory before the JVM starts.

</>
Copy
./bin/spark-submit --class com.tutorialkart.spark.AppConfigureExample --master yarn --deploy-mode client --driver-memory 2g target/spark-app.jar

Configure Spark application with spark-submit and spark-defaults.conf

spark-submit is the usual entry point for running a Spark application outside an IDE. Use dedicated options when available, and use --conf for Spark properties that do not have a dedicated option.

</>
Copy
./bin/spark-submit --class com.tutorialkart.spark.AppConfigureExample --master yarn --deploy-mode cluster --name "DailySparkJob" --driver-memory 2g --executor-memory 4g --conf spark.driver.maxResultSize=512m target/spark-app.jar

Use spark-defaults.conf when several jobs should share the same defaults. Each line contains a property name and value separated by whitespace.

</>
Copy
spark.master                 yarn
spark.submit.deployMode       cluster
spark.executor.memory         4g
spark.driver.memory           2g
spark.driver.maxResultSize    512m
spark.logConf                 false

Common Spark properties for executors, master, local directory, logging, and listeners

Spark propertyDefaultPurpose
spark.executor.memory1gHeap memory for each executor process. Set it with --executor-memory for submitted jobs.
spark.master(none)Cluster manager URL, such as local[2], spark://host:7077, yarn, or k8s://https://host:port.
spark.submit.deployModeclientDefines whether the driver runs on the submitting machine or inside the cluster.
spark.local.dir/tmpLocal scratch space for shuffle files and spilled data. Cluster managers may override this with environment variables.
spark.logConffalseLogs the effective SparkConf at startup when set to true.
spark.extraListeners(none)Comma-separated listener classes for custom monitoring, auditing, or metrics.
spark.log.callerContext(none)Concise application context written to YARN/HDFS audit logs when applicable.
spark.driver.supervisefalseRestarts the driver automatically in Spark standalone cluster mode if it fails with a non-zero status.

After startup, verify the final values in the Spark web UI under the Environment tab. In local mode, the UI is usually available at http://localhost:4040 while the application is running. You can also print sc.getConf().toDebugString(), as shown in the examples above.

Apache Spark application configuration QA checklist

  • The application name is clear enough to identify the job in Spark UI and logs.
  • Driver memory is set with --driver-memory for client-mode jobs.
  • Executor memory is configured separately from driver memory.
  • spark.driver.maxResultSize is used as a safety limit, not as a substitute for better result handling.
  • The effective Spark properties are verified in the Spark UI or startup logs.

FAQs about configuring Apache Spark application

How to configure Apache Spark application?

You can configure an Apache Spark application using SparkConf, spark-submit options, and spark-defaults.conf. Use SparkConf for application-controlled settings and spark-submit for deployment values such as master, deploy mode, driver memory, executor memory, and driver cores.

Which Spark configuration source has the highest priority?

Values set directly on SparkConf have the highest priority. Values passed through spark-submit or a properties file come next. Values in spark-defaults.conf are used when no higher-priority value is supplied.

Why does spark.driver.memory not change when set inside application code?

In client mode, the driver JVM has already started before the application creates SparkConf. Therefore, setting spark.driver.memory in code does not resize the JVM. Set driver memory with --driver-memory or a defaults/properties file before startup.

Where can I view the final Spark configuration values?

Open the Spark application web UI and check the Environment tab. You can also print sc.getConf().toDebugString() or enable spark.logConf=true while debugging configuration issues.

Should spark.driver.maxResultSize be set to 0?

Setting spark.driver.maxResultSize to 0 removes the limit, but it can expose the driver to out-of-memory errors. Prefer reducing the data returned to the driver or writing results to storage instead of collecting very large results.

Summary of Apache Spark application configuration

In this Apache Spark Tutorial, we learned how to configure a Spark application using SparkConf, spark-submit, and spark-defaults.conf. We also discussed common Spark properties including spark.app.name, spark.driver.cores, spark.driver.maxResultSize, spark.driver.memory, spark.executor.memory, spark.master, spark.submit.deployMode, spark.local.dir, spark.logConf, spark.extraListeners, spark.log.callerContext, and spark.driver.supervise.